2021-06-14T16:12:45+08:00


        *** GPGPU-Sim Simulator Version 3.2.2  [build 0] ***


GPGPU-Sim PTX: simulation mode 0 (can change with PTX_SIM_MODE_FUNC environment variable:
               1=functional simulation only, 0=detailed performance simulator)
GPGPU-Sim: Configuration options:

-network_mode                           1 # Interconnection network mode
-inter_config_file   config_fermi_islip.icnt # Interconnection network config file
-gpgpu_ptx_use_cuobjdump                    1 # Use cuobjdump to extract ptx and sass from binaries
-gpgpu_experimental_lib_support                    0 # Try to extract code from cuda libraries [Broken because of unknown cudaGetExportTable]
-gpgpu_ptx_convert_to_ptxplus                    0 # Convert SASS (native ISA) to ptxplus and run ptxplus
-gpgpu_ptx_force_max_capability                   20 # Force maximum compute capability
-gpgpu_ptx_inst_debug_to_file                    0 # Dump executed instructions' debug information to file
-gpgpu_ptx_inst_debug_file       inst_debug.txt # Executed instructions' debug output file
-gpgpu_ptx_inst_debug_thread_uid                    1 # Thread UID for executed instructions' debug output
-gpgpu_simd_model                       1 # 1 = post-dominator
-gpgpu_shader_core_pipeline              1536:32 # shader core pipeline config, i.e., {<nthread>:<warpsize>}
-gpgpu_tex_cache:l1  4:128:24,L:R:m:N:L,F:128:4,128:2 # per-shader L1 texture cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>:<rf>}
-gpgpu_const_cache:l1 64:64:2,L:R:f:N:L,A:2:32,4 # per-shader L1 constant memory cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:il1     4:128:4,L:R:f:N:L,A:2:32,4 # shader L1 instruction cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:dl1     32:128:4,L:L:m:N:H,A:32:8,8 # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PrefL1                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PreShared                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gmem_skip_L1D                          0 # global memory access skip L1D cache (implements -Xptxas -dlcm=cg, default=no skip)
-gpgpu_perfect_mem                      0 # enable perfect memory mode (no cache miss)
-n_regfile_gating_group                    4 # group of lanes that should be read/written together)
-gpgpu_clock_gated_reg_file                    0 # enable clock gated reg file for power calculations
-gpgpu_clock_gated_lanes                    0 # enable clock gated lanes for power calculations
-gpgpu_shader_registers                32768 # Number of registers per shader core. Limits number of concurrent CTAs. (default 8192)
-gpgpu_shader_cta                       8 # Maximum number of concurrent CTAs in shader (default 8)
-gpgpu_num_cta_barriers                   16 # Maximum number of named barriers per CTA (default 16)
-gpgpu_n_clusters                      15 # number of processing clusters
-gpgpu_n_cores_per_cluster                    8 # number of simd cores per cluster
-gpgpu_n_cluster_ejection_buffer_size                    8 # number of packets in ejection buffer
-gpgpu_n_ldst_response_buffer_size                    2 # number of response packets in ld/st unit ejection buffer
-gpgpu_shmem_size                   16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size                   49152 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefL1                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefShared                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_num_banks                   32 # Number of banks in the shared memory in each shader core (default 16)
-gpgpu_shmem_limited_broadcast                    0 # Limit shared memory to do one broadcast per cycle (default on)
-gpgpu_shmem_warp_parts                    1 # Number of portions a warp is divided into for shared memory bank conflict check 
-gpgpu_warpdistro_shader                   -1 # Specify which shader core to collect the warp size distribution from
-gpgpu_warp_issue_shader                    0 # Specify which shader core to collect the warp issue distribution from
-gpgpu_local_mem_map                    1 # Mapping from local memory space address to simulated GPU physical address space (default = enabled)
-gpgpu_num_reg_banks                   16 # Number of register banks (default = 8)
-gpgpu_reg_bank_use_warp_id                    0 # Use warp ID in mapping registers to banks (default = off)
-gpgpu_operand_collector_num_units_sp                    6 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_sfu                    8 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_mem                    2 # number of collector units (default = 2)
-gpgpu_operand_collector_num_units_gen                    0 # number of collector units (default = 0)
-gpgpu_operand_collector_num_in_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_operand_collector_num_out_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_coalesce_arch                   13 # Coalescing arch (default = 13, anything else is off for now)
-gpgpu_num_sched_per_core                    2 # Number of warp schedulers per core
-gpgpu_max_insn_issue_per_warp                    1 # Max number of instructions that can be issued per warp in one cycle by scheduler
-gpgpu_simt_core_sim_order                    1 # Select the simulation order of cores in a cluster (0=Fix, 1=Round-Robin)
-gpgpu_pipeline_widths        2,1,1,2,1,1,2 # Pipeline widths ID_OC_SP,ID_OC_SFU,ID_OC_MEM,OC_EX_SP,OC_EX_SFU,OC_EX_MEM,EX_WB
-gpgpu_num_sp_units                     2 # Number of SP units (default=1)
-gpgpu_num_sfu_units                    1 # Number of SF units (default=1)
-gpgpu_num_mem_units                    1 # Number if ldst units (default=1) WARNING: not hooked up to anything
-gpgpu_scheduler                      gto # Scheduler configuration: < lrr | gto | two_level_active > If two_level_active:<num_active_warps>:<inner_prioritization>:<outer_prioritization>For complete list of prioritization values see shader.h enum scheduler_prioritization_typeDefault: gto
-gpgpu_dram_scheduler                    1 # 0 = fifo, 1 = FR-FCFS (defaul)
-gpgpu_dram_partition_queues              8:8:8:8 # i2$:$2d:d2$:$2i
-l2_ideal                               0 # Use a ideal L2 cache that always hit
-gpgpu_cache:dl2     64:128:8,L:B:m:W:L,A:32:4,4:0,32 # unified banked L2 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>}
-gpgpu_cache:dl2_texture_only                    0 # L2 cache used for texture only
-gpgpu_n_mem                            6 # number of memory modules (e.g. memory controllers) in gpu
-gpgpu_n_sub_partition_per_mchannel                    2 # number of memory subpartition in each memory module
-gpgpu_n_mem_per_ctrlr                    2 # number of memory chips per memory controller
-gpgpu_memlatency_stat                   14 # track and display latency statistics 0x2 enables MC, 0x4 enables queue logs
-gpgpu_frfcfs_dram_sched_queue_size                   16 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_return_queue_size                  116 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_buswidth                    4 # default = 4 bytes (8 bytes per cycle at DDR)
-gpgpu_dram_burst_length                    8 # Burst length of each DRAM request (default = 4 data bus cycle)
-dram_data_command_freq_ratio                    4 # Frequency ratio between DRAM data bus and command bus (default = 2 times, i.e. DDR)
-gpgpu_dram_timing_opt nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40: CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2 # DRAM timing parameters = {nbk:tCCD:tRRD:tRCD:tRAS:tRP:tRC:CL:WL:tCDLR:tWR:nbkgrp:tCCDL:tRTPL}
-rop_latency                          120 # ROP queue latency (default 85)
-dram_latency                         100 # DRAM latency (default 30)
-gpgpu_mem_addr_mapping dramid@8;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS # mapping memory address to dram model {dramid@<start bit>;<memory address map>}
-gpgpu_mem_addr_test                    0 # run sweep test to check address mapping for aliased address
-gpgpu_mem_address_mask                    1 # 0 = old addressing mask, 1 = new addressing mask, 2 = new add. mask + flipped bank sel and chip sel bits
-gpuwattch_xml_file  gpuwattch_gtx480.xml # GPUWattch XML file
-power_simulation_enabled                    1 # Turn on power simulator (1=On, 0=Off)
-power_per_cycle_dump                    0 # Dump detailed power output each cycle
-power_trace_enabled                    0 # produce a file for the power trace (1=On, 0=Off)
-power_trace_zlevel                     6 # Compression level of the power trace output log (0=no comp, 9=highest)
-steady_power_levels_enabled                    0 # produce a file for the steady power levels (1=On, 0=Off)
-steady_state_definition                  8:4 # allowed deviation:number of samples
-gpgpu_max_cycle                        0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_insn                         0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_cta                          0 # terminates gpu simulation early (0 = no limit)
-gpgpu_runtime_stat                   500 # display runtime statistics such as dram utilization {<freq>:<flag>}
-liveness_message_freq                    1 # Minimum number of seconds between simulation liveness messages (0 = always print)
-gpgpu_flush_l1_cache                    0 # Flush L1 cache at the end of each kernel call
-gpgpu_flush_l2_cache                    0 # Flush L2 cache at the end of each kernel call
-gpgpu_deadlock_detect                    1 # Stop the simulation at deadlock (1=on (default), 0=off)
-gpgpu_ptx_instruction_classification                    0 # if enabled will classify ptx instruction types per kernel (Max 255 kernels now)
-gpgpu_ptx_sim_mode                     0 # Select between Performance (default) or Functional simulation (1)
-gpgpu_clock_domains 700.0:700.0:700.0:924.0 # Clock Domain Frequencies in MhZ {<Core Clock>:<ICNT Clock>:<L2 Clock>:<DRAM Clock>}
-gpgpu_max_concurrent_kernel                    8 # maximum kernels that can run concurrently on GPU
-gpgpu_cflog_interval                    0 # Interval between each snapshot in control flow logger
-visualizer_enabled                     0 # Turn on visualizer output (1=On, 0=Off)
-visualizer_outputfile                 NULL # Specifies the output log file for visualizer
-visualizer_zlevel                      6 # Compression level of the visualizer output log (0=no comp, 9=highest)
-trace_enabled                          0 # Turn on traces
-trace_components                    none # comma seperated list of traces to enable. Complete list found in trace_streams.tup. Default none
-trace_sampling_core                    0 # The core which is printed using CORE_DPRINTF. Default 0
-trace_sampling_memory_partition                   -1 # The memory partition which is printed using MEMPART_DPRINTF. Default -1 (i.e. all)
-enable_ptx_file_line_stats                    1 # Turn on PTX source line statistic profiling. (1 = On)
-ptx_line_stats_filename gpgpu_inst_stats.txt # Output file for PTX source line statistics.
-save_embedded_ptx                      0 # saves ptx files embedded in binary as <n>.ptx
-keep                                   0 # keep intermediate files created by GPGPU-Sim when interfacing with external programs
-gpgpu_ptx_save_converted_ptxplus                    0 # Saved converted ptxplus to a file
-ptx_opcode_latency_int         4,13,4,5,145 # Opcode latencies for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,19,25,145
-ptx_opcode_latency_fp          4,13,4,5,39 # Opcode latencies for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,30
-ptx_opcode_latency_dp         8,19,8,8,330 # Opcode latencies for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,335
-ptx_opcode_initiation_int            1,2,2,1,8 # Opcode initiation intervals for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,4,4,32
-ptx_opcode_initiation_fp            1,2,1,1,4 # Opcode initiation intervals for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,5
-ptx_opcode_initiation_dp         8,16,8,8,130 # Opcode initiation intervals for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,130
DRAM Timing Options:
nbk                                    16 # number of banks
CCD                                     2 # column to column delay
RRD                                     6 # minimal delay between activation of rows in different banks
RCD                                    12 # row to column delay
RAS                                    28 # time needed to activate row
RP                                     12 # time needed to precharge (deactivate) row
RC                                     40 # row cycle time
CDLR                                    5 # switching from write to read (changes tWTR)
WR                                     12 # last data-in to row precharge
CL                                     12 # CAS latency
WL                                      4 # Write latency
nbkgrp                                  4 # number of bank groups
CCDL                                    3 # column to column delay between accesses to different bank groups
RTPL                                    2 # read to precharge delay between accesses to different bank groups
Total number of memory sub partition = 12
addr_dec_mask[CHIP]  = 0000000000000000 	high:64 low:0
addr_dec_mask[BK]    = 000000000000e100 	high:16 low:8
addr_dec_mask[ROW]   = 000000000fff0000 	high:28 low:16
addr_dec_mask[COL]   = 0000000000001eff 	high:13 low:0
addr_dec_mask[BURST] = 000000000000003f 	high:6 low:0
sub_partition_id_mask = 0000000000000100
GPGPU-Sim uArch: clock freqs: 700000000.000000:700000000.000000:700000000.000000:924000000.000000
GPGPU-Sim uArch: clock periods: 0.00000000142857142857:0.00000000142857142857:0.00000000142857142857:0.00000000108225108225
*** Initializing Memory Statistics ***
GPGPU-Sim uArch: interconnect node map (shaderID+MemID to icntID)
GPGPU-Sim uArch: Memory nodes ID start from index: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
GPGPU-Sim uArch: interconnect node reverse map (icntID to shaderID+MemID)
GPGPU-Sim uArch: Memory nodes start from ID: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
8b51d2418a0658287a30fe3c4cc1fd21  /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
GPGPU-Sim uArch: performance model initialization complete.
GPGPU-Sim PTX: __cudaRegisterFatBinary, fat_cubin_handle = 1, filename=mm.cu
self exe links to: /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
Running md5sum using "md5sum /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM "
Running cuobjdump using "$CUDA_INSTALL_PATH/bin/cuobjdump -ptx -elf -sass /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM > _cuobjdump_complete_output_J2MfVd"
Parsing file _cuobjdump_complete_output_J2MfVd
######### cuobjdump parser ########
## Adding new section ELF
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_1.ptx
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section ELF
Adding arch: sm_20
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_2.ptx
Adding arch: sm_20
Adding identifier: mm.cu
Done parsing!!!
GPGPU-Sim PTX: __cudaRegisterFunction _Z14matrix_mul_gpuPiS_S_i : hostFun 0x0x400ce0, fat_cubin_handle = 1
GPGPU-Sim PTX: instruction assembly for function '_Z14matrix_mul_gpuPiS_S_i'...   done.
GPGPU-Sim PTX: finding reconvergence points for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: reconvergence points for _Z14matrix_mul_gpuPiS_S_i...
GPGPU-Sim PTX:  1 (potential) branch divergence @  PC=0x048 (_1.ptx:71) @%p1 bra $Lt_0_2306;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX:  2 (potential) branch divergence @  PC=0x130 (_1.ptx:103) @%p2 bra $Lt_0_1794;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:  3 (potential) branch divergence @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX: ... end of reconvergence points for _Z14matrix_mul_gpuPiS_S_i
GPGPU-Sim PTX: ... done pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'.
GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file _1.ptx
Adding _cuobjdump_2.ptx with cubin handle 1
GPGPU-Sim PTX: extracting embedded .ptx to temporary file "_ptx_41MzfA"
Running: cat _ptx_41MzfA | sed 's/.version 1.5/.version 1.4/' | sed 's/, texmode_independent//' | sed 's/\(\.extern \.const\[1\] .b8 \w\+\)\[\]/\1\[1\]/' | sed 's/const\[.\]/const\[0\]/g' > _ptx2_7ItUzW
GPGPU-Sim PTX: generating ptxinfo using "$CUDA_INSTALL_PATH/bin/ptxas --gpu-name=sm_20 -v _ptx2_7ItUzW --output-file  /dev/null 2> _ptx_41MzfAinfo"
GPGPU-Sim PTX: Kernel '_Z14matrix_mul_gpuPiS_S_i' : regs=14, lmem=0, smem=0, cmem=60
GPGPU-Sim PTX: removing ptxinfo using "rm -f _ptx_41MzfA _ptx2_7ItUzW _ptx_41MzfAinfo"
GPGPU-Sim PTX: loading globals with explicit initializers... 
GPGPU-Sim PTX: finished loading globals (0 bytes total).
GPGPU-Sim PTX: loading constants with explicit initializers...  done.
Block(10,10)   Grid(15,15).

GPGPU-Sim PTX: cudaLaunch for 0x0x400ce0 (mode=performance simulation) on stream 0
GPGPU-Sim PTX: pushing kernel '_Z14matrix_mul_gpuPiS_S_i' to stream 0, gridDim= (15,15,1) blockDim = (10,10,1) 
kernel '_Z14matrix_mul_gpuPiS_S_i' transfer to GPU hardware scheduler
GPGPU-Sim uArch: Shader 8 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: CTA/core = 8, limited by: cta_limit
GPGPU-Sim uArch: core:  8, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 16 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 16, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 24 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 24, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 32 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 32, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 40 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 40, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 48 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 48, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 56 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 56, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 64 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 64, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 72 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 72, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 80 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 80, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 88 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 88, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 96 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 96, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 104 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:104, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 112 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:112, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 0 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  0, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 9 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  9, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 17 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 17, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 25 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 25, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 33 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 33, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 41 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 41, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 49 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 49, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 57 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 57, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 65 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 65, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 73 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 73, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 81 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 81, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 89 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 89, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 97 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 97, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 105 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:105, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 113 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:113, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 1 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  1, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 10 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 10, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 18 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 18, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 26 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 26, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 34 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 34, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 42 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 42, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 50 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 50, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 58 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 58, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 66 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 66, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 74 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 74, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 82 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 82, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 90 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 90, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 98 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 98, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 106 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:106, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 114 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:114, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 2 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  2, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 11 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 11, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 19 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 19, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 27 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 27, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 35 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 35, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 43 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 43, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 51 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 51, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 59 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 59, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 67 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 67, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 75 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 75, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 83 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 83, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 91 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 91, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 99 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 99, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 107 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:107, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 115 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:115, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 3 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  3, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 12 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 12, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 20 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 20, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 28 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 28, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 36 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 36, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 44 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 44, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 52 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 52, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 60 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 60, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 68 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 68, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 76 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 76, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 84 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 84, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 92 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 92, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 100 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:100, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 108 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:108, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 116 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:116, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 4 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  4, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 13 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 13, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 21 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 21, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 29 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 29, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 37 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 37, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 45 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 45, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 53 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 53, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 61 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 61, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 69 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 69, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 77 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 77, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 85 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 85, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 93 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 93, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 101 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:101, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 109 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:109, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 117 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:117, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 5 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  5, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 14 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 14, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 22 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 22, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 30 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 30, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 38 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 38, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 46 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 46, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 54 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 54, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 62 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 62, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 70 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 70, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 78 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 78, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 86 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 86, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 94 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 94, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 102 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:102, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 110 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:110, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 118 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:118, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 6 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  6, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 15 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 15, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 23 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 23, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 31 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 31, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 39 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 39, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 47 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 47, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 55 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 55, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 63 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 63, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 71 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 71, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 79 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 79, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 87 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 87, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 95 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 95, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 103 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:103, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 111 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:111, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 119 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:119, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 7 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  7, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: core:  8, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 16, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 24, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 32, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 40, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 48, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 56, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 64, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 72, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 80, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 88, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 96, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:104, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:112, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  0, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  9, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 17, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 25, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 33, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 41, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 49, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 57, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 65, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 73, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 81, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 89, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 97, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:105, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:113, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:  1, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 10, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 18, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 26, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 34, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 42, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 50, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 58, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 66, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 74, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 82, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 90, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 98, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:106, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:114, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:  2, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 11, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 19, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 27, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 35, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 43, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 51, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 59, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 67, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 75, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 83, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 91, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 99, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:107, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:115, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:  3, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 12, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 20, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 28, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 36, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 44, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 52, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 60, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 68, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 76, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 84, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 92, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:100, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:108, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:116, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:  4, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 13, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 21, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 29, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 37, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 45, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 53, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 61, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 69, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 77, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 85, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 93, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:101, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:109, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:117, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:  5, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 14, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 22, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 30, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 38, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 46, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 54, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 62, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 70, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 78, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 86, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 94, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:102, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:110, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:118, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:  6, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: cycles simulated: 500  inst.: 49456 (ipc=98.9) sim_rate=49456 (inst/sec) elapsed = 0:0:00:01 / Mon Jun 14 16:12:49 2021
GPGPU-Sim PTX: 100000 instructions simulated : ctaid=(2,12,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 1000  inst.: 155464 (ipc=155.5) sim_rate=77732 (inst/sec) elapsed = 0:0:00:02 / Mon Jun 14 16:12:50 2021
GPGPU-Sim PTX: 200000 instructions simulated : ctaid=(1,9,0) tid=(7,2,0)
GPGPU-Sim PTX: 300000 instructions simulated : ctaid=(13,12,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 1500  inst.: 294800 (ipc=196.5) sim_rate=98266 (inst/sec) elapsed = 0:0:00:03 / Mon Jun 14 16:12:51 2021
GPGPU-Sim PTX: 400000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 500000 instructions simulated : ctaid=(3,11,0) tid=(7,4,0)
GPGPU-Sim uArch: cycles simulated: 2000  inst.: 460980 (ipc=230.5) sim_rate=115245 (inst/sec) elapsed = 0:0:00:04 / Mon Jun 14 16:12:52 2021
GPGPU-Sim PTX: 600000 instructions simulated : ctaid=(0,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 700000 instructions simulated : ctaid=(3,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 2500  inst.: 658596 (ipc=263.4) sim_rate=131719 (inst/sec) elapsed = 0:0:00:05 / Mon Jun 14 16:12:53 2021
GPGPU-Sim uArch: cycles simulated: 3000  inst.: 686456 (ipc=228.8) sim_rate=114409 (inst/sec) elapsed = 0:0:00:06 / Mon Jun 14 16:12:54 2021
GPGPU-Sim uArch: cycles simulated: 3500  inst.: 722996 (ipc=206.6) sim_rate=103285 (inst/sec) elapsed = 0:0:00:07 / Mon Jun 14 16:12:55 2021
GPGPU-Sim PTX: 800000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 4000  inst.: 767180 (ipc=191.8) sim_rate=95897 (inst/sec) elapsed = 0:0:00:08 / Mon Jun 14 16:12:56 2021
GPGPU-Sim uArch: cycles simulated: 4500  inst.: 852268 (ipc=189.4) sim_rate=94696 (inst/sec) elapsed = 0:0:00:09 / Mon Jun 14 16:12:57 2021
GPGPU-Sim PTX: 900000 instructions simulated : ctaid=(4,10,0) tid=(9,1,0)
GPGPU-Sim PTX: 1000000 instructions simulated : ctaid=(9,0,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 5000  inst.: 1010580 (ipc=202.1) sim_rate=101058 (inst/sec) elapsed = 0:0:00:10 / Mon Jun 14 16:12:58 2021
GPGPU-Sim PTX: 1100000 instructions simulated : ctaid=(14,2,0) tid=(3,0,0)
GPGPU-Sim PTX: 1200000 instructions simulated : ctaid=(13,8,0) tid=(1,7,0)
GPGPU-Sim PTX: 1300000 instructions simulated : ctaid=(1,12,0) tid=(5,1,0)
GPGPU-Sim PTX: 1400000 instructions simulated : ctaid=(3,6,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 5500  inst.: 1387024 (ipc=252.2) sim_rate=126093 (inst/sec) elapsed = 0:0:00:11 / Mon Jun 14 16:12:59 2021
GPGPU-Sim PTX: 1500000 instructions simulated : ctaid=(10,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 1600000 instructions simulated : ctaid=(6,3,0) tid=(3,4,0)
GPGPU-Sim PTX: 1700000 instructions simulated : ctaid=(11,7,0) tid=(7,0,0)
GPGPU-Sim PTX: 1800000 instructions simulated : ctaid=(4,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 6000  inst.: 1834944 (ipc=305.8) sim_rate=152912 (inst/sec) elapsed = 0:0:00:12 / Mon Jun 14 16:13:00 2021
GPGPU-Sim PTX: 1900000 instructions simulated : ctaid=(10,0,0) tid=(5,3,0)
GPGPU-Sim PTX: 2000000 instructions simulated : ctaid=(12,3,0) tid=(9,5,0)
GPGPU-Sim PTX: 2100000 instructions simulated : ctaid=(3,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 2200000 instructions simulated : ctaid=(8,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 2300000 instructions simulated : ctaid=(12,7,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 6500  inst.: 2264888 (ipc=348.4) sim_rate=161777 (inst/sec) elapsed = 0:0:00:14 / Mon Jun 14 16:13:02 2021
GPGPU-Sim PTX: 2400000 instructions simulated : ctaid=(6,6,0) tid=(1,9,0)
GPGPU-Sim PTX: 2500000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 2600000 instructions simulated : ctaid=(4,12,0) tid=(3,4,0)
GPGPU-Sim PTX: 2700000 instructions simulated : ctaid=(12,1,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 7000  inst.: 2673128 (ipc=381.9) sim_rate=178208 (inst/sec) elapsed = 0:0:00:15 / Mon Jun 14 16:13:03 2021
GPGPU-Sim PTX: 2800000 instructions simulated : ctaid=(3,0,0) tid=(7,2,0)
GPGPU-Sim PTX: 2900000 instructions simulated : ctaid=(0,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 3000000 instructions simulated : ctaid=(5,4,0) tid=(1,3,0)
GPGPU-Sim uArch: cycles simulated: 7500  inst.: 3038248 (ipc=405.1) sim_rate=189890 (inst/sec) elapsed = 0:0:00:16 / Mon Jun 14 16:13:04 2021
GPGPU-Sim PTX: 3100000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3200000 instructions simulated : ctaid=(12,11,0) tid=(7,8,0)
GPGPU-Sim PTX: 3300000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3400000 instructions simulated : ctaid=(4,12,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 8000  inst.: 3416336 (ipc=427.0) sim_rate=200960 (inst/sec) elapsed = 0:0:00:17 / Mon Jun 14 16:13:05 2021
GPGPU-Sim PTX: 3500000 instructions simulated : ctaid=(11,13,0) tid=(1,9,0)
GPGPU-Sim PTX: 3600000 instructions simulated : ctaid=(10,12,0) tid=(9,1,0)
GPGPU-Sim PTX: 3700000 instructions simulated : ctaid=(4,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 3800000 instructions simulated : ctaid=(13,11,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 8500  inst.: 3794960 (ipc=446.5) sim_rate=199734 (inst/sec) elapsed = 0:0:00:19 / Mon Jun 14 16:13:07 2021
GPGPU-Sim PTX: 3900000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 4000000 instructions simulated : ctaid=(2,8,0) tid=(1,3,0)
GPGPU-Sim PTX: 4100000 instructions simulated : ctaid=(8,14,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 9000  inst.: 4145552 (ipc=460.6) sim_rate=207277 (inst/sec) elapsed = 0:0:00:20 / Mon Jun 14 16:13:08 2021
GPGPU-Sim PTX: 4200000 instructions simulated : ctaid=(13,11,0) tid=(3,8,0)
GPGPU-Sim PTX: 4300000 instructions simulated : ctaid=(9,1,0) tid=(3,2,0)
GPGPU-Sim PTX: 4400000 instructions simulated : ctaid=(7,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 4500000 instructions simulated : ctaid=(4,6,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 9500  inst.: 4494464 (ipc=473.1) sim_rate=214022 (inst/sec) elapsed = 0:0:00:21 / Mon Jun 14 16:13:09 2021
GPGPU-Sim PTX: 4600000 instructions simulated : ctaid=(10,0,0) tid=(9,9,0)
GPGPU-Sim PTX: 4700000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 4800000 instructions simulated : ctaid=(11,7,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 10000  inst.: 4846372 (ipc=484.6) sim_rate=220289 (inst/sec) elapsed = 0:0:00:22 / Mon Jun 14 16:13:10 2021
GPGPU-Sim PTX: 4900000 instructions simulated : ctaid=(3,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 5000000 instructions simulated : ctaid=(10,14,0) tid=(5,9,0)
GPGPU-Sim PTX: 5100000 instructions simulated : ctaid=(8,10,0) tid=(1,3,0)
GPGPU-Sim PTX: 5200000 instructions simulated : ctaid=(11,1,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 10500  inst.: 5179948 (ipc=493.3) sim_rate=225215 (inst/sec) elapsed = 0:0:00:23 / Mon Jun 14 16:13:11 2021
GPGPU-Sim PTX: 5300000 instructions simulated : ctaid=(7,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 5400000 instructions simulated : ctaid=(9,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 5500000 instructions simulated : ctaid=(1,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 11000  inst.: 5542924 (ipc=503.9) sim_rate=221716 (inst/sec) elapsed = 0:0:00:25 / Mon Jun 14 16:13:13 2021
GPGPU-Sim PTX: 5600000 instructions simulated : ctaid=(3,8,0) tid=(7,8,0)
GPGPU-Sim PTX: 5700000 instructions simulated : ctaid=(8,0,0) tid=(3,8,0)
GPGPU-Sim PTX: 5800000 instructions simulated : ctaid=(2,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 5900000 instructions simulated : ctaid=(4,12,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 11500  inst.: 5863464 (ipc=509.9) sim_rate=225517 (inst/sec) elapsed = 0:0:00:26 / Mon Jun 14 16:13:14 2021
GPGPU-Sim PTX: 6000000 instructions simulated : ctaid=(10,4,0) tid=(7,4,0)
GPGPU-Sim PTX: 6100000 instructions simulated : ctaid=(7,9,0) tid=(5,7,0)
GPGPU-Sim PTX: 6200000 instructions simulated : ctaid=(9,7,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 12000  inst.: 6198952 (ipc=516.6) sim_rate=229590 (inst/sec) elapsed = 0:0:00:27 / Mon Jun 14 16:13:15 2021
GPGPU-Sim PTX: 6300000 instructions simulated : ctaid=(0,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 6400000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 6500000 instructions simulated : ctaid=(6,10,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 12500  inst.: 6514828 (ipc=521.2) sim_rate=232672 (inst/sec) elapsed = 0:0:00:28 / Mon Jun 14 16:13:16 2021
GPGPU-Sim PTX: 6600000 instructions simulated : ctaid=(9,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 6700000 instructions simulated : ctaid=(7,2,0) tid=(5,9,0)
GPGPU-Sim PTX: 6800000 instructions simulated : ctaid=(1,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 6900000 instructions simulated : ctaid=(9,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 13000  inst.: 6850760 (ipc=527.0) sim_rate=236233 (inst/sec) elapsed = 0:0:00:29 / Mon Jun 14 16:13:17 2021
GPGPU-Sim PTX: 7000000 instructions simulated : ctaid=(5,1,0) tid=(1,7,0)
GPGPU-Sim PTX: 7100000 instructions simulated : ctaid=(13,0,0) tid=(5,1,0)
GPGPU-Sim PTX: 7200000 instructions simulated : ctaid=(10,11,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 13500  inst.: 7177796 (ipc=531.7) sim_rate=231541 (inst/sec) elapsed = 0:0:00:31 / Mon Jun 14 16:13:19 2021
GPGPU-Sim PTX: 7300000 instructions simulated : ctaid=(3,5,0) tid=(3,4,0)
GPGPU-Sim PTX: 7400000 instructions simulated : ctaid=(1,12,0) tid=(3,0,0)
GPGPU-Sim PTX: 7500000 instructions simulated : ctaid=(2,12,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 14000  inst.: 7513232 (ipc=536.7) sim_rate=234788 (inst/sec) elapsed = 0:0:00:32 / Mon Jun 14 16:13:20 2021
GPGPU-Sim PTX: 7600000 instructions simulated : ctaid=(12,4,0) tid=(3,8,0)
GPGPU-Sim PTX: 7700000 instructions simulated : ctaid=(5,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 7800000 instructions simulated : ctaid=(10,0,0) tid=(7,4,0)
GPGPU-Sim PTX: 7900000 instructions simulated : ctaid=(11,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 14500  inst.: 7861928 (ipc=542.2) sim_rate=238240 (inst/sec) elapsed = 0:0:00:33 / Mon Jun 14 16:13:21 2021
GPGPU-Sim PTX: 8000000 instructions simulated : ctaid=(13,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 8100000 instructions simulated : ctaid=(8,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 8200000 instructions simulated : ctaid=(9,13,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 15000  inst.: 8177372 (ipc=545.2) sim_rate=240510 (inst/sec) elapsed = 0:0:00:34 / Mon Jun 14 16:13:22 2021
GPGPU-Sim PTX: 8300000 instructions simulated : ctaid=(4,4,0) tid=(3,0,0)
GPGPU-Sim PTX: 8400000 instructions simulated : ctaid=(10,10,0) tid=(1,7,0)
GPGPU-Sim PTX: 8500000 instructions simulated : ctaid=(0,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 15500  inst.: 8534132 (ipc=550.6) sim_rate=243832 (inst/sec) elapsed = 0:0:00:35 / Mon Jun 14 16:13:23 2021
GPGPU-Sim PTX: 8600000 instructions simulated : ctaid=(12,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 8700000 instructions simulated : ctaid=(4,3,0) tid=(1,3,0)
GPGPU-Sim PTX: 8800000 instructions simulated : ctaid=(9,14,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 16000  inst.: 8846628 (ipc=552.9) sim_rate=245739 (inst/sec) elapsed = 0:0:00:36 / Mon Jun 14 16:13:24 2021
GPGPU-Sim PTX: 8900000 instructions simulated : ctaid=(0,1,0) tid=(7,4,0)
GPGPU-Sim PTX: 9000000 instructions simulated : ctaid=(10,11,0) tid=(1,7,0)
GPGPU-Sim PTX: 9100000 instructions simulated : ctaid=(12,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 9200000 instructions simulated : ctaid=(10,7,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 16500  inst.: 9193752 (ipc=557.2) sim_rate=241940 (inst/sec) elapsed = 0:0:00:38 / Mon Jun 14 16:13:26 2021
GPGPU-Sim PTX: 9300000 instructions simulated : ctaid=(10,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 9400000 instructions simulated : ctaid=(2,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 9500000 instructions simulated : ctaid=(3,9,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 17000  inst.: 9519480 (ipc=560.0) sim_rate=244089 (inst/sec) elapsed = 0:0:00:39 / Mon Jun 14 16:13:27 2021
GPGPU-Sim PTX: 9600000 instructions simulated : ctaid=(12,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 9700000 instructions simulated : ctaid=(1,3,0) tid=(7,0,0)
GPGPU-Sim PTX: 9800000 instructions simulated : ctaid=(7,0,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 17500  inst.: 9845216 (ipc=562.6) sim_rate=246130 (inst/sec) elapsed = 0:0:00:40 / Mon Jun 14 16:13:28 2021
GPGPU-Sim PTX: 9900000 instructions simulated : ctaid=(1,6,0) tid=(3,2,0)
GPGPU-Sim PTX: 10000000 instructions simulated : ctaid=(10,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 10100000 instructions simulated : ctaid=(10,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 10200000 instructions simulated : ctaid=(2,4,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 18000  inst.: 10175904 (ipc=565.3) sim_rate=248192 (inst/sec) elapsed = 0:0:00:41 / Mon Jun 14 16:13:29 2021
GPGPU-Sim PTX: 10300000 instructions simulated : ctaid=(10,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 10400000 instructions simulated : ctaid=(8,8,0) tid=(9,9,0)
GPGPU-Sim PTX: 10500000 instructions simulated : ctaid=(13,8,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 18500  inst.: 10526504 (ipc=569.0) sim_rate=250631 (inst/sec) elapsed = 0:0:00:42 / Mon Jun 14 16:13:30 2021
GPGPU-Sim PTX: 10600000 instructions simulated : ctaid=(13,12,0) tid=(5,7,0)
GPGPU-Sim PTX: 10700000 instructions simulated : ctaid=(14,2,0) tid=(7,4,0)
GPGPU-Sim PTX: 10800000 instructions simulated : ctaid=(9,1,0) tid=(3,6,0)
GPGPU-Sim PTX: 10900000 instructions simulated : ctaid=(14,6,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 19000  inst.: 10861260 (ipc=571.6) sim_rate=252587 (inst/sec) elapsed = 0:0:00:43 / Mon Jun 14 16:13:31 2021
GPGPU-Sim PTX: 11000000 instructions simulated : ctaid=(3,13,0) tid=(1,3,0)
GPGPU-Sim PTX: 11100000 instructions simulated : ctaid=(13,11,0) tid=(7,6,0)
GPGPU-Sim PTX: 11200000 instructions simulated : ctaid=(11,6,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 19500  inst.: 11182232 (ipc=573.4) sim_rate=248494 (inst/sec) elapsed = 0:0:00:45 / Mon Jun 14 16:13:33 2021
GPGPU-Sim PTX: 11300000 instructions simulated : ctaid=(7,7,0) tid=(5,9,0)
GPGPU-Sim PTX: 11400000 instructions simulated : ctaid=(9,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 11500000 instructions simulated : ctaid=(14,10,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 20000  inst.: 11510844 (ipc=575.5) sim_rate=250235 (inst/sec) elapsed = 0:0:00:46 / Mon Jun 14 16:13:34 2021
GPGPU-Sim PTX: 11600000 instructions simulated : ctaid=(2,13,0) tid=(3,0,0)
GPGPU-Sim PTX: 11700000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 11800000 instructions simulated : ctaid=(1,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 20500  inst.: 11845228 (ipc=577.8) sim_rate=252026 (inst/sec) elapsed = 0:0:00:47 / Mon Jun 14 16:13:35 2021
GPGPU-Sim PTX: 11900000 instructions simulated : ctaid=(12,9,0) tid=(5,1,0)
GPGPU-Sim PTX: 12000000 instructions simulated : ctaid=(9,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 12100000 instructions simulated : ctaid=(10,1,0) tid=(7,6,0)
GPGPU-Sim PTX: 12200000 instructions simulated : ctaid=(12,2,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 21000  inst.: 12183192 (ipc=580.2) sim_rate=253816 (inst/sec) elapsed = 0:0:00:48 / Mon Jun 14 16:13:36 2021
GPGPU-Sim PTX: 12300000 instructions simulated : ctaid=(7,7,0) tid=(7,6,0)
GPGPU-Sim PTX: 12400000 instructions simulated : ctaid=(9,1,0) tid=(3,0,0)
GPGPU-Sim PTX: 12500000 instructions simulated : ctaid=(14,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 21500  inst.: 12511912 (ipc=581.9) sim_rate=255345 (inst/sec) elapsed = 0:0:00:49 / Mon Jun 14 16:13:37 2021
GPGPU-Sim PTX: 12600000 instructions simulated : ctaid=(13,7,0) tid=(9,9,0)
GPGPU-Sim PTX: 12700000 instructions simulated : ctaid=(0,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 12800000 instructions simulated : ctaid=(5,9,0) tid=(1,9,0)
GPGPU-Sim PTX: 12900000 instructions simulated : ctaid=(5,13,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 22000  inst.: 12861064 (ipc=584.6) sim_rate=257221 (inst/sec) elapsed = 0:0:00:50 / Mon Jun 14 16:13:38 2021
GPGPU-Sim PTX: 13000000 instructions simulated : ctaid=(14,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 13100000 instructions simulated : ctaid=(1,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 13200000 instructions simulated : ctaid=(11,12,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 22500  inst.: 13200500 (ipc=586.7) sim_rate=253855 (inst/sec) elapsed = 0:0:00:52 / Mon Jun 14 16:13:40 2021
GPGPU-Sim PTX: 13300000 instructions simulated : ctaid=(11,1,0) tid=(7,0,0)
GPGPU-Sim PTX: 13400000 instructions simulated : ctaid=(11,4,0) tid=(5,9,0)
GPGPU-Sim PTX: 13500000 instructions simulated : ctaid=(3,13,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 23000  inst.: 13520992 (ipc=587.9) sim_rate=255113 (inst/sec) elapsed = 0:0:00:53 / Mon Jun 14 16:13:41 2021
GPGPU-Sim PTX: 13600000 instructions simulated : ctaid=(5,12,0) tid=(9,5,0)
GPGPU-Sim PTX: 13700000 instructions simulated : ctaid=(3,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 13800000 instructions simulated : ctaid=(1,2,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 23500  inst.: 13840084 (ipc=588.9) sim_rate=256297 (inst/sec) elapsed = 0:0:00:54 / Mon Jun 14 16:13:42 2021
GPGPU-Sim PTX: 13900000 instructions simulated : ctaid=(11,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 14000000 instructions simulated : ctaid=(7,6,0) tid=(5,1,0)
GPGPU-Sim PTX: 14100000 instructions simulated : ctaid=(9,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 14200000 instructions simulated : ctaid=(14,8,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 24000  inst.: 14188812 (ipc=591.2) sim_rate=257978 (inst/sec) elapsed = 0:0:00:55 / Mon Jun 14 16:13:43 2021
GPGPU-Sim PTX: 14300000 instructions simulated : ctaid=(10,11,0) tid=(7,0,0)
GPGPU-Sim PTX: 14400000 instructions simulated : ctaid=(13,13,0) tid=(3,2,0)
GPGPU-Sim PTX: 14500000 instructions simulated : ctaid=(3,10,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 24500  inst.: 14517376 (ipc=592.5) sim_rate=259238 (inst/sec) elapsed = 0:0:00:56 / Mon Jun 14 16:13:44 2021
GPGPU-Sim PTX: 14600000 instructions simulated : ctaid=(12,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 14700000 instructions simulated : ctaid=(7,7,0) tid=(5,7,0)
GPGPU-Sim PTX: 14800000 instructions simulated : ctaid=(13,8,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 25000  inst.: 14846428 (ipc=593.9) sim_rate=260463 (inst/sec) elapsed = 0:0:00:57 / Mon Jun 14 16:13:45 2021
GPGPU-Sim PTX: 14900000 instructions simulated : ctaid=(9,5,0) tid=(9,9,0)
GPGPU-Sim PTX: 15000000 instructions simulated : ctaid=(2,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 15100000 instructions simulated : ctaid=(8,13,0) tid=(9,5,0)
GPGPU-Sim PTX: 15200000 instructions simulated : ctaid=(4,12,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 25500  inst.: 15182476 (ipc=595.4) sim_rate=257330 (inst/sec) elapsed = 0:0:00:59 / Mon Jun 14 16:13:47 2021
GPGPU-Sim PTX: 15300000 instructions simulated : ctaid=(6,12,0) tid=(3,2,0)
GPGPU-Sim PTX: 15400000 instructions simulated : ctaid=(13,0,0) tid=(1,7,0)
GPGPU-Sim PTX: 15500000 instructions simulated : ctaid=(10,1,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 26000  inst.: 15506780 (ipc=596.4) sim_rate=258446 (inst/sec) elapsed = 0:0:01:00 / Mon Jun 14 16:13:48 2021
GPGPU-Sim PTX: 15600000 instructions simulated : ctaid=(2,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 15700000 instructions simulated : ctaid=(5,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 15800000 instructions simulated : ctaid=(2,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 15900000 instructions simulated : ctaid=(6,5,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 26500  inst.: 15853292 (ipc=598.2) sim_rate=259890 (inst/sec) elapsed = 0:0:01:01 / Mon Jun 14 16:13:49 2021
GPGPU-Sim PTX: 16000000 instructions simulated : ctaid=(13,2,0) tid=(7,2,0)
GPGPU-Sim PTX: 16100000 instructions simulated : ctaid=(14,0,0) tid=(1,9,0)
GPGPU-Sim PTX: 16200000 instructions simulated : ctaid=(12,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 27000  inst.: 16182996 (ipc=599.4) sim_rate=261016 (inst/sec) elapsed = 0:0:01:02 / Mon Jun 14 16:13:50 2021
GPGPU-Sim PTX: 16300000 instructions simulated : ctaid=(5,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 16400000 instructions simulated : ctaid=(13,10,0) tid=(7,0,0)
GPGPU-Sim PTX: 16500000 instructions simulated : ctaid=(14,13,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 27500  inst.: 16529016 (ipc=601.1) sim_rate=262365 (inst/sec) elapsed = 0:0:01:03 / Mon Jun 14 16:13:51 2021
GPGPU-Sim PTX: 16600000 instructions simulated : ctaid=(11,10,0) tid=(3,8,0)
GPGPU-Sim PTX: 16700000 instructions simulated : ctaid=(12,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 16800000 instructions simulated : ctaid=(1,12,0) tid=(7,8,0)
GPGPU-Sim PTX: 16900000 instructions simulated : ctaid=(9,11,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 28000  inst.: 16859168 (ipc=602.1) sim_rate=263424 (inst/sec) elapsed = 0:0:01:04 / Mon Jun 14 16:13:52 2021
GPGPU-Sim PTX: 17000000 instructions simulated : ctaid=(5,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 17100000 instructions simulated : ctaid=(7,11,0) tid=(1,9,0)
GPGPU-Sim PTX: 17200000 instructions simulated : ctaid=(6,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 28500  inst.: 17206480 (ipc=603.7) sim_rate=260704 (inst/sec) elapsed = 0:0:01:06 / Mon Jun 14 16:13:54 2021
GPGPU-Sim PTX: 17300000 instructions simulated : ctaid=(14,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 17400000 instructions simulated : ctaid=(1,11,0) tid=(9,1,0)
GPGPU-Sim PTX: 17500000 instructions simulated : ctaid=(9,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 29000  inst.: 17530396 (ipc=604.5) sim_rate=261647 (inst/sec) elapsed = 0:0:01:07 / Mon Jun 14 16:13:55 2021
GPGPU-Sim PTX: 17600000 instructions simulated : ctaid=(8,5,0) tid=(9,7,0)
GPGPU-Sim PTX: 17700000 instructions simulated : ctaid=(9,2,0) tid=(1,3,0)
GPGPU-Sim PTX: 17800000 instructions simulated : ctaid=(11,11,0) tid=(5,9,0)
GPGPU-Sim PTX: 17900000 instructions simulated : ctaid=(14,8,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 29500  inst.: 17869380 (ipc=605.7) sim_rate=262785 (inst/sec) elapsed = 0:0:01:08 / Mon Jun 14 16:13:56 2021
GPGPU-Sim PTX: 18000000 instructions simulated : ctaid=(11,8,0) tid=(1,1,0)
GPGPU-Sim PTX: 18100000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 18200000 instructions simulated : ctaid=(12,13,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 30000  inst.: 18174448 (ipc=605.8) sim_rate=263397 (inst/sec) elapsed = 0:0:01:09 / Mon Jun 14 16:13:57 2021
GPGPU-Sim PTX: 18300000 instructions simulated : ctaid=(4,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 18400000 instructions simulated : ctaid=(1,14,0) tid=(9,1,0)
GPGPU-Sim PTX: 18500000 instructions simulated : ctaid=(0,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 30500  inst.: 18511332 (ipc=606.9) sim_rate=264447 (inst/sec) elapsed = 0:0:01:10 / Mon Jun 14 16:13:58 2021
GPGPU-Sim PTX: 18600000 instructions simulated : ctaid=(11,5,0) tid=(7,6,0)
GPGPU-Sim PTX: 18700000 instructions simulated : ctaid=(5,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 18800000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 18900000 instructions simulated : ctaid=(1,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 31000  inst.: 18869068 (ipc=608.7) sim_rate=262070 (inst/sec) elapsed = 0:0:01:12 / Mon Jun 14 16:14:00 2021
GPGPU-Sim PTX: 19000000 instructions simulated : ctaid=(11,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 19100000 instructions simulated : ctaid=(1,4,0) tid=(7,8,0)
GPGPU-Sim PTX: 19200000 instructions simulated : ctaid=(4,9,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 31500  inst.: 19201680 (ipc=609.6) sim_rate=263036 (inst/sec) elapsed = 0:0:01:13 / Mon Jun 14 16:14:01 2021
GPGPU-Sim PTX: 19300000 instructions simulated : ctaid=(6,13,0) tid=(9,9,0)
GPGPU-Sim PTX: 19400000 instructions simulated : ctaid=(11,0,0) tid=(9,1,0)
GPGPU-Sim PTX: 19500000 instructions simulated : ctaid=(9,2,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 32000  inst.: 19534468 (ipc=610.5) sim_rate=263979 (inst/sec) elapsed = 0:0:01:14 / Mon Jun 14 16:14:02 2021
GPGPU-Sim PTX: 19600000 instructions simulated : ctaid=(8,2,0) tid=(1,5,0)
GPGPU-Sim PTX: 19700000 instructions simulated : ctaid=(11,14,0) tid=(7,6,0)
GPGPU-Sim PTX: 19800000 instructions simulated : ctaid=(12,2,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 32500  inst.: 19828596 (ipc=610.1) sim_rate=264381 (inst/sec) elapsed = 0:0:01:15 / Mon Jun 14 16:14:03 2021
GPGPU-Sim PTX: 19900000 instructions simulated : ctaid=(4,3,0) tid=(3,6,0)
GPGPU-Sim PTX: 20000000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 20100000 instructions simulated : ctaid=(1,1,0) tid=(1,3,0)
GPGPU-Sim PTX: 20200000 instructions simulated : ctaid=(4,2,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33000  inst.: 20172780 (ipc=611.3) sim_rate=265431 (inst/sec) elapsed = 0:0:01:16 / Mon Jun 14 16:14:04 2021
GPGPU-Sim PTX: 20300000 instructions simulated : ctaid=(9,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 20400000 instructions simulated : ctaid=(10,10,0) tid=(9,5,0)
GPGPU-Sim PTX: 20500000 instructions simulated : ctaid=(9,4,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33500  inst.: 20511528 (ipc=612.3) sim_rate=266383 (inst/sec) elapsed = 0:0:01:17 / Mon Jun 14 16:14:05 2021
GPGPU-Sim PTX: 20600000 instructions simulated : ctaid=(6,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 20700000 instructions simulated : ctaid=(12,12,0) tid=(7,2,0)
GPGPU-Sim PTX: 20800000 instructions simulated : ctaid=(8,4,0) tid=(9,7,0)
GPGPU-Sim PTX: 20900000 instructions simulated : ctaid=(8,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 34000  inst.: 20866696 (ipc=613.7) sim_rate=267521 (inst/sec) elapsed = 0:0:01:18 / Mon Jun 14 16:14:06 2021
GPGPU-Sim PTX: 21000000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 21100000 instructions simulated : ctaid=(7,2,0) tid=(9,1,0)
GPGPU-Sim PTX: 21200000 instructions simulated : ctaid=(1,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 34500  inst.: 21196192 (ipc=614.4) sim_rate=264952 (inst/sec) elapsed = 0:0:01:20 / Mon Jun 14 16:14:08 2021
GPGPU-Sim PTX: 21300000 instructions simulated : ctaid=(6,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 21400000 instructions simulated : ctaid=(4,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 21500000 instructions simulated : ctaid=(2,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 35000  inst.: 21528812 (ipc=615.1) sim_rate=265787 (inst/sec) elapsed = 0:0:01:21 / Mon Jun 14 16:14:09 2021
GPGPU-Sim PTX: 21600000 instructions simulated : ctaid=(12,14,0) tid=(1,3,0)
GPGPU-Sim PTX: 21700000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim PTX: 21800000 instructions simulated : ctaid=(5,14,0) tid=(3,0,0)
GPGPU-Sim PTX: 21900000 instructions simulated : ctaid=(12,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 35500  inst.: 21850644 (ipc=615.5) sim_rate=266471 (inst/sec) elapsed = 0:0:01:22 / Mon Jun 14 16:14:10 2021
GPGPU-Sim PTX: 22000000 instructions simulated : ctaid=(5,2,0) tid=(9,9,0)
GPGPU-Sim PTX: 22100000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 22200000 instructions simulated : ctaid=(11,9,0) tid=(7,4,0)
GPGPU-Sim uArch: cycles simulated: 36000  inst.: 22180956 (ipc=616.1) sim_rate=267240 (inst/sec) elapsed = 0:0:01:23 / Mon Jun 14 16:14:11 2021
GPGPU-Sim PTX: 22300000 instructions simulated : ctaid=(2,1,0) tid=(5,3,0)
GPGPU-Sim PTX: 22400000 instructions simulated : ctaid=(1,5,0) tid=(5,3,0)
GPGPU-Sim PTX: 22500000 instructions simulated : ctaid=(2,4,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 36500  inst.: 22532600 (ipc=617.3) sim_rate=268245 (inst/sec) elapsed = 0:0:01:24 / Mon Jun 14 16:14:12 2021
GPGPU-Sim PTX: 22600000 instructions simulated : ctaid=(14,9,0) tid=(1,7,0)
GPGPU-Sim PTX: 22700000 instructions simulated : ctaid=(2,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 22800000 instructions simulated : ctaid=(5,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 22900000 instructions simulated : ctaid=(4,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 37000  inst.: 22872968 (ipc=618.2) sim_rate=269093 (inst/sec) elapsed = 0:0:01:25 / Mon Jun 14 16:14:13 2021
GPGPU-Sim PTX: 23000000 instructions simulated : ctaid=(13,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 23100000 instructions simulated : ctaid=(6,3,0) tid=(3,0,0)
GPGPU-Sim PTX: 23200000 instructions simulated : ctaid=(3,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 37500  inst.: 23195320 (ipc=618.5) sim_rate=266612 (inst/sec) elapsed = 0:0:01:27 / Mon Jun 14 16:14:15 2021
GPGPU-Sim PTX: 23300000 instructions simulated : ctaid=(8,6,0) tid=(9,1,0)
GPGPU-Sim PTX: 23400000 instructions simulated : ctaid=(1,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 23500000 instructions simulated : ctaid=(0,10,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 38000  inst.: 23505996 (ipc=618.6) sim_rate=267113 (inst/sec) elapsed = 0:0:01:28 / Mon Jun 14 16:14:16 2021
GPGPU-Sim PTX: 23600000 instructions simulated : ctaid=(5,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 23700000 instructions simulated : ctaid=(13,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 23800000 instructions simulated : ctaid=(1,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 38500  inst.: 23837924 (ipc=619.2) sim_rate=267841 (inst/sec) elapsed = 0:0:01:29 / Mon Jun 14 16:14:17 2021
GPGPU-Sim PTX: 23900000 instructions simulated : ctaid=(4,3,0) tid=(5,1,0)
GPGPU-Sim PTX: 24000000 instructions simulated : ctaid=(13,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24100000 instructions simulated : ctaid=(6,1,0) tid=(9,7,0)
GPGPU-Sim PTX: 24200000 instructions simulated : ctaid=(5,11,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 39000  inst.: 24174400 (ipc=619.9) sim_rate=268604 (inst/sec) elapsed = 0:0:01:30 / Mon Jun 14 16:14:18 2021
GPGPU-Sim PTX: 24300000 instructions simulated : ctaid=(13,6,0) tid=(7,2,0)
GPGPU-Sim PTX: 24400000 instructions simulated : ctaid=(5,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 24500000 instructions simulated : ctaid=(10,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 39500  inst.: 24517452 (ipc=620.7) sim_rate=269422 (inst/sec) elapsed = 0:0:01:31 / Mon Jun 14 16:14:19 2021
GPGPU-Sim PTX: 24600000 instructions simulated : ctaid=(7,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24700000 instructions simulated : ctaid=(0,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 24800000 instructions simulated : ctaid=(8,8,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 40000  inst.: 24838280 (ipc=621.0) sim_rate=269981 (inst/sec) elapsed = 0:0:01:32 / Mon Jun 14 16:14:20 2021
GPGPU-Sim PTX: 24900000 instructions simulated : ctaid=(13,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 25000000 instructions simulated : ctaid=(10,0,0) tid=(3,4,0)
GPGPU-Sim PTX: 25100000 instructions simulated : ctaid=(4,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 25200000 instructions simulated : ctaid=(11,13,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 40500  inst.: 25182424 (ipc=621.8) sim_rate=270778 (inst/sec) elapsed = 0:0:01:33 / Mon Jun 14 16:14:21 2021
GPGPU-Sim PTX: 25300000 instructions simulated : ctaid=(14,12,0) tid=(1,1,0)
GPGPU-Sim PTX: 25400000 instructions simulated : ctaid=(4,3,0) tid=(1,1,0)
GPGPU-Sim PTX: 25500000 instructions simulated : ctaid=(6,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 41000  inst.: 25501984 (ipc=622.0) sim_rate=268441 (inst/sec) elapsed = 0:0:01:35 / Mon Jun 14 16:14:23 2021
GPGPU-Sim PTX: 25600000 instructions simulated : ctaid=(14,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 25700000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 25800000 instructions simulated : ctaid=(1,8,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 41500  inst.: 25839444 (ipc=622.6) sim_rate=269160 (inst/sec) elapsed = 0:0:01:36 / Mon Jun 14 16:14:24 2021
GPGPU-Sim PTX: 25900000 instructions simulated : ctaid=(13,4,0) tid=(7,0,0)
GPGPU-Sim PTX: 26000000 instructions simulated : ctaid=(12,14,0) tid=(5,5,0)
GPGPU-Sim PTX: 26100000 instructions simulated : ctaid=(8,14,0) tid=(1,7,0)
GPGPU-Sim PTX: 26200000 instructions simulated : ctaid=(8,0,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 42000  inst.: 26158016 (ipc=622.8) sim_rate=269670 (inst/sec) elapsed = 0:0:01:37 / Mon Jun 14 16:14:25 2021
GPGPU-Sim PTX: 26300000 instructions simulated : ctaid=(6,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 26400000 instructions simulated : ctaid=(4,6,0) tid=(9,9,0)
GPGPU-Sim PTX: 26500000 instructions simulated : ctaid=(7,14,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 42500  inst.: 26488808 (ipc=623.3) sim_rate=270293 (inst/sec) elapsed = 0:0:01:38 / Mon Jun 14 16:14:26 2021
GPGPU-Sim PTX: 26600000 instructions simulated : ctaid=(0,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 26700000 instructions simulated : ctaid=(13,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 26800000 instructions simulated : ctaid=(14,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 43000  inst.: 26799472 (ipc=623.2) sim_rate=270701 (inst/sec) elapsed = 0:0:01:39 / Mon Jun 14 16:14:27 2021
GPGPU-Sim PTX: 26900000 instructions simulated : ctaid=(14,5,0) tid=(1,9,0)
GPGPU-Sim PTX: 27000000 instructions simulated : ctaid=(9,8,0) tid=(5,7,0)
GPGPU-Sim PTX: 27100000 instructions simulated : ctaid=(3,1,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 43500  inst.: 27137944 (ipc=623.9) sim_rate=271379 (inst/sec) elapsed = 0:0:01:40 / Mon Jun 14 16:14:28 2021
GPGPU-Sim PTX: 27200000 instructions simulated : ctaid=(5,13,0) tid=(7,0,0)
GPGPU-Sim PTX: 27300000 instructions simulated : ctaid=(11,8,0) tid=(7,6,0)
GPGPU-Sim PTX: 27400000 instructions simulated : ctaid=(8,1,0) tid=(9,3,0)
GPGPU-Sim PTX: 27500000 instructions simulated : ctaid=(10,4,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 44000  inst.: 27461176 (ipc=624.1) sim_rate=271892 (inst/sec) elapsed = 0:0:01:41 / Mon Jun 14 16:14:29 2021
GPGPU-Sim PTX: 27600000 instructions simulated : ctaid=(3,4,0) tid=(5,5,0)
GPGPU-Sim PTX: 27700000 instructions simulated : ctaid=(6,2,0) tid=(3,2,0)
GPGPU-Sim PTX: 27800000 instructions simulated : ctaid=(0,8,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 44500  inst.: 27822080 (ipc=625.2) sim_rate=270117 (inst/sec) elapsed = 0:0:01:43 / Mon Jun 14 16:14:31 2021
GPGPU-Sim PTX: 27900000 instructions simulated : ctaid=(6,10,0) tid=(5,7,0)
GPGPU-Sim PTX: 28000000 instructions simulated : ctaid=(9,6,0) tid=(1,3,0)
GPGPU-Sim PTX: 28100000 instructions simulated : ctaid=(10,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 45000  inst.: 28146700 (ipc=625.5) sim_rate=270641 (inst/sec) elapsed = 0:0:01:44 / Mon Jun 14 16:14:32 2021
GPGPU-Sim PTX: 28200000 instructions simulated : ctaid=(9,2,0) tid=(1,1,0)
GPGPU-Sim PTX: 28300000 instructions simulated : ctaid=(10,6,0) tid=(5,5,0)
GPGPU-Sim PTX: 28400000 instructions simulated : ctaid=(2,0,0) tid=(1,5,0)
GPGPU-Sim PTX: 28500000 instructions simulated : ctaid=(4,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 45500  inst.: 28480056 (ipc=625.9) sim_rate=271238 (inst/sec) elapsed = 0:0:01:45 / Mon Jun 14 16:14:33 2021
GPGPU-Sim PTX: 28600000 instructions simulated : ctaid=(8,11,0) tid=(7,4,0)
GPGPU-Sim PTX: 28700000 instructions simulated : ctaid=(13,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 28800000 instructions simulated : ctaid=(8,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 46000  inst.: 28803560 (ipc=626.2) sim_rate=271731 (inst/sec) elapsed = 0:0:01:46 / Mon Jun 14 16:14:34 2021
GPGPU-Sim PTX: 28900000 instructions simulated : ctaid=(14,4,0) tid=(1,9,0)
GPGPU-Sim PTX: 29000000 instructions simulated : ctaid=(7,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 29100000 instructions simulated : ctaid=(11,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 46500  inst.: 29129700 (ipc=626.4) sim_rate=272240 (inst/sec) elapsed = 0:0:01:47 / Mon Jun 14 16:14:35 2021
GPGPU-Sim PTX: 29200000 instructions simulated : ctaid=(2,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 29300000 instructions simulated : ctaid=(14,6,0) tid=(3,8,0)
GPGPU-Sim PTX: 29400000 instructions simulated : ctaid=(12,14,0) tid=(5,1,0)
GPGPU-Sim PTX: 29500000 instructions simulated : ctaid=(8,14,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 47000  inst.: 29456612 (ipc=626.7) sim_rate=272746 (inst/sec) elapsed = 0:0:01:48 / Mon Jun 14 16:14:36 2021
GPGPU-Sim PTX: 29600000 instructions simulated : ctaid=(12,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 29700000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 29800000 instructions simulated : ctaid=(7,13,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 47500  inst.: 29786252 (ipc=627.1) sim_rate=270784 (inst/sec) elapsed = 0:0:01:50 / Mon Jun 14 16:14:38 2021
GPGPU-Sim PTX: 29900000 instructions simulated : ctaid=(5,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 30000000 instructions simulated : ctaid=(12,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 30100000 instructions simulated : ctaid=(7,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 48000  inst.: 30128620 (ipc=627.7) sim_rate=271429 (inst/sec) elapsed = 0:0:01:51 / Mon Jun 14 16:14:39 2021
GPGPU-Sim PTX: 30200000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 30300000 instructions simulated : ctaid=(5,5,0) tid=(5,1,0)
GPGPU-Sim PTX: 30400000 instructions simulated : ctaid=(8,2,0) tid=(5,5,0)
GPGPU-Sim PTX: 30500000 instructions simulated : ctaid=(5,7,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 48500  inst.: 30458644 (ipc=628.0) sim_rate=271952 (inst/sec) elapsed = 0:0:01:52 / Mon Jun 14 16:14:40 2021
GPGPU-Sim PTX: 30600000 instructions simulated : ctaid=(8,9,0) tid=(9,3,0)
GPGPU-Sim PTX: 30700000 instructions simulated : ctaid=(11,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 30800000 instructions simulated : ctaid=(14,14,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 49000  inst.: 30791028 (ipc=628.4) sim_rate=272486 (inst/sec) elapsed = 0:0:01:53 / Mon Jun 14 16:14:41 2021
GPGPU-Sim PTX: 30900000 instructions simulated : ctaid=(1,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 31000000 instructions simulated : ctaid=(5,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 31100000 instructions simulated : ctaid=(14,8,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 49500  inst.: 31060536 (ipc=627.5) sim_rate=272460 (inst/sec) elapsed = 0:0:01:54 / Mon Jun 14 16:14:42 2021
GPGPU-Sim uArch: cycles simulated: 50000  inst.: 31139460 (ipc=622.8) sim_rate=270777 (inst/sec) elapsed = 0:0:01:55 / Mon Jun 14 16:14:43 2021
GPGPU-Sim PTX: 31200000 instructions simulated : ctaid=(11,12,0) tid=(1,5,0)
GPGPU-Sim uArch: Shader 52 finished CTA #0 (50169,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #0 (50210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 52 finished CTA #1 (50232,0), 0 CTAs running
GPGPU-Sim uArch: Shader 52 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 28 finished CTA #0 (50379,0), 1 CTAs running
GPGPU-Sim uArch: Shader 42 finished CTA #0 (50398,0), 1 CTAs running
GPGPU-Sim uArch: Shader 57 finished CTA #0 (50496,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 50500  inst.: 31175264 (ipc=617.3) sim_rate=268752 (inst/sec) elapsed = 0:0:01:56 / Mon Jun 14 16:14:44 2021
GPGPU-Sim uArch: Shader 65 finished CTA #0 (50597,0), 1 CTAs running
GPGPU-Sim uArch: Shader 104 finished CTA #0 (50638,0), 1 CTAs running
GPGPU-Sim uArch: Shader 30 finished CTA #1 (50665,0), 1 CTAs running
GPGPU-Sim uArch: Shader 112 finished CTA #0 (50754,0), 1 CTAs running
GPGPU-Sim uArch: Shader 28 finished CTA #1 (50810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 28 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 57 finished CTA #1 (50829,0), 0 CTAs running
GPGPU-Sim uArch: Shader 57 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 75 finished CTA #0 (50837,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #0 (50839,0), 1 CTAs running
GPGPU-Sim uArch: Shader 60 finished CTA #0 (50843,0), 1 CTAs running
GPGPU-Sim uArch: Shader 47 finished CTA #0 (50845,0), 0 CTAs running
GPGPU-Sim uArch: Shader 47 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #0 (50861,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #1 (50862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 9 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #1 (50894,0), 0 CTAs running
GPGPU-Sim uArch: Shader 88 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 112 finished CTA #1 (50899,0), 0 CTAs running
GPGPU-Sim uArch: Shader 112 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 87 finished CTA #0 (50903,0), 0 CTAs running
GPGPU-Sim uArch: Shader 87 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 60 finished CTA #1 (50905,0), 0 CTAs running
GPGPU-Sim uArch: Shader 60 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 55 finished CTA #0 (50915,0), 0 CTAs running
GPGPU-Sim uArch: Shader 55 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 30 finished CTA #0 (50933,0), 0 CTAs running
GPGPU-Sim uArch: Shader 30 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 32 finished CTA #0 (50948,0), 1 CTAs running
GPGPU-Sim uArch: Shader 39 finished CTA #0 (50953,0), 0 CTAs running
GPGPU-Sim uArch: Shader 39 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 73 finished CTA #0 (50959,0), 1 CTAs running
GPGPU-Sim uArch: Shader 114 finished CTA #0 (50971,0), 1 CTAs running
GPGPU-Sim uArch: Shader 80 finished CTA #0 (50985,0), 1 CTAs running
GPGPU-Sim uArch: Shader 0 finished CTA #0 (50990,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 51000  inst.: 31177832 (ipc=611.3) sim_rate=266477 (inst/sec) elapsed = 0:0:01:57 / Mon Jun 14 16:14:45 2021
GPGPU-Sim uArch: Shader 29 finished CTA #1 (51027,0), 1 CTAs running
GPGPU-Sim uArch: Shader 23 finished CTA #0 (51028,0), 0 CTAs running
GPGPU-Sim uArch: Shader 23 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #0 (51028,0), 1 CTAs running
GPGPU-Sim uArch: Shader 63 finished CTA #0 (51035,0), 0 CTAs running
GPGPU-Sim uArch: Shader 63 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 29 finished CTA #0 (51056,0), 0 CTAs running
GPGPU-Sim uArch: Shader 29 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #0 (51059,0), 1 CTAs running
GPGPU-Sim uArch: Shader 32 finished CTA #1 (51065,0), 0 CTAs running
GPGPU-Sim uArch: Shader 32 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 17 finished CTA #0 (51066,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #0 (51082,0), 1 CTAs running
GPGPU-Sim uArch: Shader 73 finished CTA #1 (51111,0), 0 CTAs running
GPGPU-Sim uArch: Shader 73 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 89 finished CTA #1 (51114,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #0 (51119,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #0 (51138,0), 1 CTAs running
GPGPU-Sim uArch: Shader 94 finished CTA #0 (51144,0), 1 CTAs running
GPGPU-Sim uArch: Shader 17 finished CTA #1 (51149,0), 0 CTAs running
GPGPU-Sim uArch: Shader 17 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #0 (51157,0), 1 CTAs running
GPGPU-Sim uArch: Shader 109 finished CTA #0 (51193,0), 1 CTAs running
GPGPU-Sim uArch: Shader 75 finished CTA #1 (51194,0), 0 CTAs running
GPGPU-Sim uArch: Shader 75 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 90 finished CTA #0 (51200,0), 1 CTAs running
GPGPU-Sim uArch: Shader 15 finished CTA #0 (51206,0), 0 CTAs running
GPGPU-Sim uArch: Shader 15 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 26 finished CTA #0 (51206,0), 1 CTAs running
GPGPU-Sim uArch: Shader 72 finished CTA #0 (51210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 89 finished CTA #0 (51216,0), 0 CTAs running
GPGPU-Sim uArch: Shader 89 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 41 finished CTA #0 (51219,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #0 (51227,0), 1 CTAs running
GPGPU-Sim uArch: Shader 53 finished CTA #1 (51230,0), 1 CTAs running
GPGPU-Sim uArch: Shader 37 finished CTA #0 (51232,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #0 (51241,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #0 (51243,0), 1 CTAs running
GPGPU-Sim uArch: Shader 65 finished CTA #1 (51243,0), 0 CTAs running
GPGPU-Sim uArch: Shader 65 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 10 finished CTA #0 (51255,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #0 (51260,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #1 (51266,0), 0 CTAs running
GPGPU-Sim uArch: Shader 96 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 66 finished CTA #0 (51293,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #0 (51296,0), 1 CTAs running
GPGPU-Sim uArch: Shader 62 finished CTA #1 (51304,0), 1 CTAs running
GPGPU-Sim uArch: Shader 41 finished CTA #1 (51308,0), 0 CTAs running
GPGPU-Sim uArch: Shader 41 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 114 finished CTA #1 (51320,0), 0 CTAs running
GPGPU-Sim uArch: Shader 114 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 71 finished CTA #0 (51330,0), 0 CTAs running
GPGPU-Sim uArch: Shader 71 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 61 finished CTA #0 (51340,0), 1 CTAs running
GPGPU-Sim uArch: Shader 58 finished CTA #1 (51345,0), 1 CTAs running
GPGPU-Sim uArch: Shader 103 finished CTA #0 (51352,0), 0 CTAs running
GPGPU-Sim uArch: Shader 103 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 80 finished CTA #1 (51361,0), 0 CTAs running
GPGPU-Sim uArch: Shader 80 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 72 finished CTA #1 (51366,0), 0 CTAs running
GPGPU-Sim uArch: Shader 72 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 0 finished CTA #1 (51368,0), 0 CTAs running
GPGPU-Sim uArch: Shader 0 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #1 (51383,0), 0 CTAs running
GPGPU-Sim uArch: Shader 49 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #0 (51386,0), 1 CTAs running
GPGPU-Sim uArch: Shader 26 finished CTA #1 (51391,0), 0 CTAs running
GPGPU-Sim uArch: Shader 26 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 94 finished CTA #1 (51399,0), 0 CTAs running
GPGPU-Sim uArch: Shader 94 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #0 (51400,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #1 (51406,0), 0 CTAs running
GPGPU-Sim uArch: Shader 2 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 62 finished CTA #0 (51412,0), 0 CTAs running
GPGPU-Sim uArch: Shader 62 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #0 (51415,0), 1 CTAs running
GPGPU-Sim uArch: Shader 90 finished CTA #1 (51420,0), 0 CTAs running
GPGPU-Sim uArch: Shader 90 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #0 (51425,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #1 (51430,0), 0 CTAs running
GPGPU-Sim uArch: Shader 16 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #0 (51434,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #0 (51436,0), 1 CTAs running
GPGPU-Sim uArch: Shader 98 finished CTA #0 (51438,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #1 (51439,0), 0 CTAs running
GPGPU-Sim uArch: Shader 81 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 31 finished CTA #0 (51446,0), 0 CTAs running
GPGPU-Sim uArch: Shader 31 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #1 (51452,0), 0 CTAs running
GPGPU-Sim uArch: Shader 91 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 82 finished CTA #0 (51455,0), 1 CTAs running
GPGPU-Sim uArch: Shader 8 finished CTA #0 (51456,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #1 (51461,0), 0 CTAs running
GPGPU-Sim uArch: Shader 56 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 95 finished CTA #0 (51462,0), 0 CTAs running
GPGPU-Sim uArch: Shader 95 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 58 finished CTA #0 (51482,0), 0 CTAs running
GPGPU-Sim uArch: Shader 58 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 42 finished CTA #1 (51487,0), 0 CTAs running
GPGPU-Sim uArch: Shader 42 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #1 (51492,0), 0 CTAs running
GPGPU-Sim uArch: Shader 86 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #1 (51494,0), 0 CTAs running
GPGPU-Sim uArch: Shader 13 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: cycles simulated: 51500  inst.: 31182868 (ipc=605.5) sim_rate=264261 (inst/sec) elapsed = 0:0:01:58 / Mon Jun 14 16:14:46 2021
GPGPU-Sim uArch: Shader 64 finished CTA #1 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 107 finished CTA #0 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #1 (51503,0), 1 CTAs running
GPGPU-Sim uArch: Shader 10 finished CTA #1 (51508,0), 0 CTAs running
GPGPU-Sim uArch: Shader 10 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #0 (51509,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #1 (51516,0), 1 CTAs running
GPGPU-Sim uArch: Shader 82 finished CTA #1 (51518,0), 0 CTAs running
GPGPU-Sim uArch: Shader 82 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #0 (51528,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #0 (51530,0), 0 CTAs running
GPGPU-Sim uArch: Shader 99 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 107 finished CTA #1 (51533,0), 0 CTAs running
GPGPU-Sim uArch: Shader 107 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #0 (51536,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #1 (51538,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #1 (51549,0), 0 CTAs running
GPGPU-Sim uArch: Shader 106 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 18 finished CTA #0 (51553,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #0 (51559,0), 1 CTAs running
GPGPU-Sim uArch: Shader 61 finished CTA #1 (51560,0), 0 CTAs running
GPGPU-Sim uArch: Shader 61 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 40 finished CTA #1 (51569,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #0 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 76 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #1 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 83 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #1 (51588,0), 0 CTAs running
GPGPU-Sim uArch: Shader 25 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #0 (51593,0), 1 CTAs running
GPGPU-Sim uArch: Shader 27 finished CTA #0 (51594,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #0 (51602,0), 0 CTAs running
GPGPU-Sim uArch: Shader 33 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #1 (51614,0), 1 CTAs running
GPGPU-Sim uArch: Shader 119 finished CTA #0 (51624,0), 0 CTAs running
GPGPU-Sim uArch: Shader 119 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #0 (51626,0), 1 CTAs running
GPGPU-Sim uArch: Shader 84 finished CTA #0 (51628,0), 1 CTAs running
GPGPU-Sim uArch: Shader 97 finished CTA #0 (51634,0), 1 CTAs running
GPGPU-Sim uArch: Shader 18 finished CTA #1 (51639,0), 0 CTAs running
GPGPU-Sim uArch: Shader 18 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 93 finished CTA #0 (51640,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #0 (51650,0), 1 CTAs running
GPGPU-Sim uArch: Shader 22 finished CTA #0 (51652,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #0 (51656,0), 1 CTAs running
GPGPU-Sim uArch: Shader 108 finished CTA #0 (51657,0), 1 CTAs running
GPGPU-Sim uArch: Shader 6 finished CTA #0 (51660,0), 1 CTAs running
GPGPU-Sim uArch: Shader 40 finished CTA #0 (51662,0), 0 CTAs running
GPGPU-Sim uArch: Shader 40 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 84 finished CTA #1 (51666,0), 0 CTAs running
GPGPU-Sim uArch: Shader 84 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #1 (51668,0), 0 CTAs running
GPGPU-Sim uArch: Shader 92 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #0 (51668,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #1 (51670,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #0 (51674,0), 0 CTAs running
GPGPU-Sim uArch: Shader 24 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #1 (51678,0), 0 CTAs running
GPGPU-Sim uArch: Shader 3 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #1 (51683,0), 0 CTAs running
GPGPU-Sim uArch: Shader 59 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 8 finished CTA #1 (51687,0), 0 CTAs running
GPGPU-Sim uArch: Shader 8 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #1 (51688,0), 0 CTAs running
GPGPU-Sim uArch: Shader 115 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #0 (51689,0), 1 CTAs running
GPGPU-Sim uArch: Shader 111 finished CTA #0 (51699,0), 0 CTAs running
GPGPU-Sim uArch: Shader 111 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #0 (51699,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #1 (51701,0), 0 CTAs running
GPGPU-Sim uArch: Shader 34 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 54 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 104 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 104 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #0 (51730,0), 0 CTAs running
GPGPU-Sim uArch: Shader 48 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #1 (51739,0), 0 CTAs running
GPGPU-Sim uArch: Shader 20 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #1 (51741,0), 0 CTAs running
GPGPU-Sim uArch: Shader 11 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 97 finished CTA #1 (51748,0), 0 CTAs running
GPGPU-Sim uArch: Shader 97 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 113 finished CTA #0 (51749,0), 1 CTAs running
GPGPU-Sim uArch: Shader 113 finished CTA #1 (51751,0), 0 CTAs running
GPGPU-Sim uArch: Shader 113 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 27 finished CTA #1 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 27 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 53 finished CTA #0 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 53 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #0 (51756,0), 1 CTAs running
GPGPU-Sim uArch: Shader 43 finished CTA #0 (51767,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #1 (51771,0), 0 CTAs running
GPGPU-Sim uArch: Shader 36 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #1 (51776,0), 0 CTAs running
GPGPU-Sim uArch: Shader 105 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #0 (51783,0), 1 CTAs running
GPGPU-Sim uArch: Shader 93 finished CTA #1 (51788,0), 0 CTAs running
GPGPU-Sim uArch: Shader 93 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #0 (51790,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #1 (51794,0), 0 CTAs running
GPGPU-Sim uArch: Shader 50 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #1 (51796,0), 0 CTAs running
GPGPU-Sim uArch: Shader 67 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 7 finished CTA #0 (51797,0), 0 CTAs running
GPGPU-Sim uArch: Shader 7 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 19 finished CTA #0 (51797,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #1 (51800,0), 0 CTAs running
GPGPU-Sim uArch: Shader 74 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 43 finished CTA #1 (51801,0), 0 CTAs running
GPGPU-Sim uArch: Shader 43 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #1 (51802,0), 0 CTAs running
GPGPU-Sim uArch: Shader 102 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 79 finished CTA #0 (51810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 79 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 98 finished CTA #1 (51811,0), 0 CTAs running
GPGPU-Sim uArch: Shader 98 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #1 (51813,0), 0 CTAs running
GPGPU-Sim uArch: Shader 85 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 51 finished CTA #0 (51824,0), 1 CTAs running
GPGPU-Sim uArch: Shader 66 finished CTA #1 (51833,0), 0 CTAs running
GPGPU-Sim uArch: Shader 66 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #0 (51840,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #1 (51840,0), 0 CTAs running
GPGPU-Sim uArch: Shader 35 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 44 finished CTA #0 (51841,0), 1 CTAs running
GPGPU-Sim uArch: Shader 101 finished CTA #1 (51842,0), 1 CTAs running
GPGPU-Sim uArch: Shader 64 finished CTA #0 (51847,0), 0 CTAs running
GPGPU-Sim uArch: Shader 64 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 37 finished CTA #1 (51851,0), 0 CTAs running
GPGPU-Sim uArch: Shader 37 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 77 finished CTA #0 (51851,0), 1 CTAs running
GPGPU-Sim uArch: Shader 51 finished CTA #1 (51855,0), 0 CTAs running
GPGPU-Sim uArch: Shader 51 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 109 finished CTA #1 (51857,0), 0 CTAs running
GPGPU-Sim uArch: Shader 109 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #1 (51860,0), 0 CTAs running
GPGPU-Sim uArch: Shader 12 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 6 finished CTA #1 (51862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 6 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #1 (51867,0), 0 CTAs running
GPGPU-Sim uArch: Shader 116 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 101 finished CTA #0 (51870,0), 0 CTAs running
GPGPU-Sim uArch: Shader 101 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #1 (51871,0), 1 CTAs running
GPGPU-Sim uArch: Shader 38 finished CTA #0 (51881,0), 1 CTAs running
GPGPU-Sim uArch: Shader 19 finished CTA #1 (51882,0), 0 CTAs running
GPGPU-Sim uArch: Shader 19 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 38 finished CTA #1 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 38 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #0 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 117 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #0 (51902,0), 1 CTAs running
GPGPU-Sim uArch: Shader 44 finished CTA #1 (51906,0), 0 CTAs running
GPGPU-Sim uArch: Shader 44 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #0 (51907,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #1 (51908,0), 0 CTAs running
GPGPU-Sim uArch: Shader 21 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #1 (51913,0), 0 CTAs running
GPGPU-Sim uArch: Shader 14 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #1 (51923,0), 0 CTAs running
GPGPU-Sim uArch: Shader 1 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 22 finished CTA #1 (51928,0), 0 CTAs running
GPGPU-Sim uArch: Shader 22 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #0 (51933,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 110 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #1 (51944,0), 0 CTAs running
GPGPU-Sim uArch: Shader 78 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 108 finished CTA #1 (51951,0), 0 CTAs running
GPGPU-Sim uArch: Shader 108 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #1 (51964,0), 0 CTAs running
GPGPU-Sim uArch: Shader 45 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #0 (51966,0), 1 CTAs running
GPGPU-Sim uArch: Shader 77 finished CTA #1 (51966,0), 0 CTAs running
GPGPU-Sim uArch: Shader 77 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #1 (51969,0), 0 CTAs running
GPGPU-Sim uArch: Shader 68 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #0 (51986,0), 1 CTAs running
GPGPU-Sim uArch: Shader 70 finished CTA #0 (52000,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #0 (52004,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #1 (52012,0), 0 CTAs running
GPGPU-Sim uArch: Shader 4 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #1 (52016,0), 0 CTAs running
GPGPU-Sim uArch: Shader 100 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 110 finished CTA #1 (52020,0), 0 CTAs running
GPGPU-Sim uArch: Shader 110 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #1 (52024,0), 0 CTAs running
GPGPU-Sim uArch: Shader 118 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 5 finished CTA #0 (52044,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #1 (52044,0), 0 CTAs running
GPGPU-Sim uArch: Shader 46 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #0 (52048,0), 1 CTAs running
GPGPU-Sim uArch: Shader 5 finished CTA #1 (52061,0), 0 CTAs running
GPGPU-Sim uArch: Shader 5 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #1 (52107,0), 0 CTAs running
GPGPU-Sim uArch: Shader 69 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 70 finished CTA #1 (52113,0), 0 CTAs running
GPGPU-Sim uArch: Shader 70 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: GPU detected kernel '_Z14matrix_mul_gpuPiS_S_i' finished on shader 70.
kernel_name = _Z14matrix_mul_gpuPiS_S_i 
kernel_launch_uid = 1 
gpu_sim_cycle = 52114
gpu_sim_insn = 31185000
gpu_ipc =     598.3997
gpu_tot_sim_cycle = 52114
gpu_tot_sim_insn = 31185000
gpu_tot_ipc =     598.3997
gpu_tot_issued_cta = 0
gpu_stall_dramfull = 74963
gpu_stall_icnt2sh    = 150961
gpu_total_sim_rate=264279

========= Core cache stats =========
L1I_cache:
	L1I_total_cache_accesses = 696598
	L1I_total_cache_misses = 3598
	L1I_total_cache_miss_rate = 0.0052
	L1I_total_cache_pending_hits = 0
	L1I_total_cache_reservation_fails = 0
L1D_cache:
	L1D_cache_core[0]: Access = 40337, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9587, Reservation_fails = 17954
	L1D_cache_core[1]: Access = 40291, Miss = 2547, Miss_rate = 0.063, Pending_hits = 9564, Reservation_fails = 17242
	L1D_cache_core[2]: Access = 40297, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9568, Reservation_fails = 19637
	L1D_cache_core[3]: Access = 40231, Miss = 2551, Miss_rate = 0.063, Pending_hits = 9512, Reservation_fails = 20346
	L1D_cache_core[4]: Access = 40282, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9553, Reservation_fails = 20125
	L1D_cache_core[5]: Access = 40276, Miss = 2563, Miss_rate = 0.064, Pending_hits = 9550, Reservation_fails = 21507
	L1D_cache_core[6]: Access = 40282, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9549, Reservation_fails = 17583
	L1D_cache_core[7]: Access = 40322, Miss = 2555, Miss_rate = 0.063, Pending_hits = 9582, Reservation_fails = 19592
	L1D_cache_core[8]: Access = 40328, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9572, Reservation_fails = 18611
	L1D_cache_core[9]: Access = 40323, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9579, Reservation_fails = 17753
	L1D_cache_core[10]: Access = 40328, Miss = 2568, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 19928
	L1D_cache_core[11]: Access = 40322, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9591, Reservation_fails = 18029
	L1D_cache_core[12]: Access = 40328, Miss = 2580, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 17457
	L1D_cache_core[13]: Access = 40338, Miss = 2572, Miss_rate = 0.064, Pending_hits = 9599, Reservation_fails = 17935
	L1D_cache_core[14]: Access = 40343, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9605, Reservation_fails = 16746
	L1D_total_cache_accesses = 604628
	L1D_total_cache_misses = 38436
	L1D_total_cache_miss_rate = 0.0636
	L1D_total_cache_pending_hits = 143587
	L1D_total_cache_reservation_fails = 280445
	L1D_cache_data_port_util = 0.068
	L1D_cache_fill_port_util = 0.006
L1C_cache:
	L1C_total_cache_accesses = 3600
	L1C_total_cache_misses = 900
	L1C_total_cache_miss_rate = 0.2500
	L1C_total_cache_pending_hits = 0
	L1C_total_cache_reservation_fails = 0
L1T_cache:
	L1T_total_cache_accesses = 0
	L1T_total_cache_misses = 0
	L1T_total_cache_pending_hits = 0
	L1T_total_cache_reservation_fails = 0

Total_core_cache_stats:
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 422605
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 143587
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 34993
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 135803
	Total_core_cache_stats_breakdown[CONST_ACC_R][HIT] = 2700
	Total_core_cache_stats_breakdown[CONST_ACC_R][MISS] = 900
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 3443
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 144642
	Total_core_cache_stats_breakdown[INST_ACC_R][HIT] = 693000
	Total_core_cache_stats_breakdown[INST_ACC_R][MISS] = 3598
Shader 0 warp_id issue ditsribution:
warp_id:
0, 1, 2, 3, 4, 5, 6, 7, 
distro:
1388, 1388, 1388, 1388, 1388, 1388, 1388, 1388, 
gpgpu_n_tot_thrd_icount = 39974400
gpgpu_n_tot_w_icount = 1249200
gpgpu_n_stall_shd_mem = 614173
gpgpu_n_mem_read_local = 0
gpgpu_n_mem_write_local = 0
gpgpu_n_mem_read_global = 34993
gpgpu_n_mem_write_global = 3443
gpgpu_n_mem_texture = 0
gpgpu_n_mem_const = 120
gpgpu_n_load_insn  = 6750000
gpgpu_n_store_insn = 22500
gpgpu_n_shmem_insn = 0
gpgpu_n_tex_insn = 0
gpgpu_n_const_mem_insn = 0
gpgpu_n_param_mem_insn = 90000
gpgpu_n_shmem_bkconflict = 0
gpgpu_n_cache_bkconflict = 0
gpgpu_n_intrawarp_mshr_merge = 0
gpgpu_n_cmem_portconflict = 0
gpgpu_stall_shd_mem[c_mem][bk_conf] = 0
gpgpu_stall_shd_mem[c_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[c_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[c_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[t_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[t_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[t_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[s_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][coal_stall] = 614173
gpgpu_stall_shd_mem[gl_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[g_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[g_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpu_reg_bank_conflict_stalls = 0
Warp Occupancy Distribution:
Stall:441805	W0_Idle:1288598	W0_Scoreboard:9482477	W1:0	W2:0	W3:0	W4:312300	W5:0	W6:0	W7:0	W8:0	W9:0	W10:0	W11:0	W12:0	W13:0	W14:0	W15:0	W16:0	W17:0	W18:0	W19:0	W20:0	W21:0	W22:0	W23:0	W24:0	W25:0	W26:0	W27:0	W28:0	W29:0	W30:0	W31:0	W32:936900
traffic_breakdown_coretomem[CONST_ACC_R] = 960 {8:120,}
traffic_breakdown_coretomem[GLOBAL_ACC_R] = 279944 {8:34993,}
traffic_breakdown_coretomem[GLOBAL_ACC_W] = 220472 {40:1891,72:1035,136:517,}
traffic_breakdown_coretomem[INST_ACC_R] = 3840 {8:480,}
traffic_breakdown_memtocore[CONST_ACC_R] = 8640 {72:120,}
traffic_breakdown_memtocore[GLOBAL_ACC_R] = 4759048 {136:34993,}
traffic_breakdown_memtocore[GLOBAL_ACC_W] = 27544 {8:3443,}
traffic_breakdown_memtocore[INST_ACC_R] = 65280 {136:480,}
maxmrqlatency = 264 
maxdqlatency = 0 
maxmflatency = 2966 
averagemflatency = 331 
max_icnt2mem_latency = 3027 
max_icnt2sh_latency = 52113 
mrq_lat_table:1864 	85 	65 	133 	191 	266 	357 	250 	1 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
dq_lat_table:0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_table:0 	0 	0 	0 	0 	0 	0 	18641 	15624 	2440 	1703 	148 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2mem_lat_table:0 	0 	0 	25799 	1252 	2017 	3082 	2408 	1558 	1817 	996 	107 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2sh_lat_table:0 	0 	0 	5374 	26785 	2935 	19 	0 	0 	0 	0 	0 	0 	0 	0 	3443 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_pw_table:0 	0 	0 	0 	0 	0 	0 	80 	12 	6 	4 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
maximum concurrent accesses to same row:
dram[0]:         1         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         2         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
maximum service time to same row:
dram[0]:     48986     50145         0         0      2487      2476      2843      3294      2488      3444     15918     16579     40804     41559     49682     49938 
dram[1]:     47947     50759         0         0      2569      2004      2601      2463      2029      3357     15954     16900     40935     41738     49974     49682 
dram[2]:     50116     49853         0         0      2013      2401      2741      2529      1994      3279     16219     16887     41074     41907     49691     49690 
dram[3]:     49710     50845         0         0      2009      2478      2429      2621      2751      3897     16263     17171     41160     41972     49677     49878 
dram[4]:     50237     49850         0         0      2488      2003      3332      2538      2485      3278     16469     17200     41247     42166     49919     49687 
dram[5]:     49859     50068         0         0      2007      2513      2469      3322      3301      3935     16591     17199     41469     42218     49687     49888 
average row accesses per activate:
dram[0]:  4.250000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 64.000000 72.000000 83.000000 77.000000 
dram[1]:  7.000000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 77.000000 61.000000 82.000000 85.000000 
dram[2]: 14.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 63.000000 86.000000 81.000000 
dram[3]: 13.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 68.000000 75.000000 85.000000 
dram[4]: 15.000000 11.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 68.000000 65.000000 78.000000 88.000000 
dram[5]: 16.000000  8.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 65.000000 67.000000 81.000000 81.000000 
average row locality = 3212/88 = 36.500000
number of total memory accesses made:
dram[0]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
total accesses: 0
min_bank_accesses = 0!
min_chip_accesses = 0!
number of total read accesses:
dram[0]:         9         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[1]:         8         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[2]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[3]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[4]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[5]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
total reads: 2117
min_bank_accesses = 0!
chip skew: 355/352 = 1.01
number of total write accesses:
dram[0]:         8         8         0         0         0         0         0         0         0         0         0         0        32        40        51        45 
dram[1]:         6         8         0         0         0         0         0         0         0         0         0         0        45        29        50        53 
dram[2]:         8         5         0         0         0         0         0         0         0         0         0         0        31        31        54        49 
dram[3]:         7         5         0         0         0         0         0         0         0         0         0         0        31        36        43        53 
dram[4]:         9         7         0         0         0         0         0         0         0         0         0         0        36        33        46        56 
dram[5]:        10         4         0         0         0         0         0         0         0         0         0         0        33        35        49        49 
total reads: 1095
min_bank_accesses = 0!
chip skew: 191/175 = 1.09
average mf latency per bank:
dram[0]:       5972      1408    none      none        7624      6015      7869      7128      9772      7109      8243      8142      2583      2276      1126      1269
dram[1]:        832      1099    none      none        4678      6681      8215      6378      8926      6471      8058      8269      2353      2426      1412      1324
dram[2]:       1016       863    none      none        7296      5077      8207      5996      7922      6544      8246      8380      2536      2313      1209      1387
dram[3]:       1202       900    none      none        3901      6944      7219      6985      7134      7113      8836      8254      2245      2250      1322      1180
dram[4]:        954      1031    none      none        7742      4595      6331      6396      6875      6570      8289      8526      2078      2526      1299      1065
dram[5]:       1138      1430    none      none        4617      6835      5760      7269      6703      6460      7953      8525      2352      2337      1205      1207
maximum mf latency per bank:
dram[0]:       1445      1801         0         0      1719      1494      2630      2529      2515      1555       378       394      1663      1710      1771      1766
dram[1]:       1412      1863         0         0      1419      1625      2966      1991      2270      1260       385       394      1617      1737      1882      1992
dram[2]:       1748      1134         0         0      1929      1216      2610      2175      2008      1289       410       388      1659      1761      1974      1745
dram[3]:       1799      1558         0         0       615      1704      2483      1767      1313      1553       427       428      1645      1618      1735      1982
dram[4]:       1846      1376         0         0      1908       939      2307      2711      1561      1338       425       399      1707      1747      1971      1751
dram[5]:       1873      1435         0         0       784      1584      1609      2693      1359      1522       431       406      1760      1622      1768      1979

Number of Memory Banks Accessed per Memory Operation per Warp (from 0):
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Average # of Memory Banks Accessed per Memory Operation per Warp=-nan

position of mrq chosen
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	

average position of mrq chosen = -nan
Memory Partition 0: 
Cache L2_bank_000:
MSHR contents

Cache L2_bank_001:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[0]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67855 n_act=17 n_pre=3 n_req=539 n_rd=710 n_write=205 bw_util=0.0266
n_activity=5774 dram_eff=0.3169
bk0: 18a 68542i bk1: 12a 68532i bk2: 0a 68787i bk3: 0a 68789i bk4: 20a 68735i bk5: 20a 68735i bk6: 64a 68637i bk7: 64a 68637i bk8: 64a 68640i bk9: 64a 68645i bk10: 64a 68647i bk11: 64a 68647i bk12: 64a 67873i bk13: 64a 67623i bk14: 64a 67349i bk15: 64a 67377i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.30346
Memory Partition 1: 
Cache L2_bank_002:
MSHR contents

Cache L2_bank_003:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[1]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67854 n_act=15 n_pre=1 n_req=545 n_rd=708 n_write=212 bw_util=0.02675
n_activity=5886 dram_eff=0.3126
bk0: 16a 68532i bk1: 12a 68588i bk2: 0a 68788i bk3: 0a 68788i bk4: 20a 68735i bk5: 20a 68733i bk6: 64a 68638i bk7: 64a 68644i bk8: 64a 68647i bk9: 64a 68644i bk10: 64a 68645i bk11: 64a 68651i bk12: 64a 67904i bk13: 64a 67987i bk14: 64a 67318i bk15: 64a 67259i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.334918
Memory Partition 2: 
Cache L2_bank_004:
MSHR contents

Cache L2_bank_005:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[2]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67873 n_act=14 n_pre=0 n_req=530 n_rd=704 n_write=199 bw_util=0.02625
n_activity=5889 dram_eff=0.3067
bk0: 12a 68632i bk1: 8a 68675i bk2: 0a 68788i bk3: 0a 68789i bk4: 20a 68736i bk5: 24a 68729i bk6: 64a 68644i bk7: 64a 68646i bk8: 64a 68647i bk9: 64a 68651i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67950i bk13: 64a 67719i bk14: 64a 67369i bk15: 64a 67364i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.253743
Memory Partition 3: 
Cache L2_bank_006:
MSHR contents

Cache L2_bank_007:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[3]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67880 n_act=14 n_pre=0 n_req=527 n_rd=704 n_write=192 bw_util=0.02605
n_activity=5725 dram_eff=0.313
bk0: 12a 68629i bk1: 8a 68661i bk2: 0a 68786i bk3: 0a 68786i bk4: 20a 68738i bk5: 24a 68729i bk6: 64a 68642i bk7: 64a 68643i bk8: 64a 68643i bk9: 64a 68649i bk10: 64a 68638i bk11: 64a 68650i bk12: 64a 67791i bk13: 64a 67660i bk14: 64a 67449i bk15: 64a 67385i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.27991
Memory Partition 4: 
Cache L2_bank_008:
MSHR contents

Cache L2_bank_009:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[4]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67869 n_act=14 n_pre=0 n_req=539 n_rd=704 n_write=203 bw_util=0.02637
n_activity=5871 dram_eff=0.309
bk0: 12a 68578i bk1: 8a 68708i bk2: 0a 68790i bk3: 0a 68790i bk4: 20a 68735i bk5: 24a 68730i bk6: 64a 68649i bk7: 64a 68649i bk8: 64a 68645i bk9: 64a 68641i bk10: 64a 68643i bk11: 64a 68647i bk12: 64a 67633i bk13: 64a 67711i bk14: 64a 67323i bk15: 64a 67376i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.317125
Memory Partition 5: 
Cache L2_bank_010:
MSHR contents

Cache L2_bank_011:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[5]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67872 n_act=14 n_pre=0 n_req=532 n_rd=704 n_write=200 bw_util=0.02628
n_activity=5855 dram_eff=0.3088
bk0: 12a 68631i bk1: 8a 68658i bk2: 0a 68787i bk3: 0a 68787i bk4: 20a 68731i bk5: 24a 68730i bk6: 64a 68647i bk7: 64a 68643i bk8: 64a 68645i bk9: 64a 68645i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67817i bk13: 64a 67748i bk14: 64a 67487i bk15: 64a 67461i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.286524

========= L2 cache stats =========
L2_cache_bank[0]: Access = 3621, Miss = 179, Miss_rate = 0.049, Pending_hits = 321, Reservation_fails = 4452
L2_cache_bank[1]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4350
L2_cache_bank[2]: Access = 3450, Miss = 178, Miss_rate = 0.052, Pending_hits = 315, Reservation_fails = 4369
L2_cache_bank[3]: Access = 3206, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4198
L2_cache_bank[4]: Access = 3229, Miss = 176, Miss_rate = 0.055, Pending_hits = 297, Reservation_fails = 4050
L2_cache_bank[5]: Access = 3178, Miss = 176, Miss_rate = 0.055, Pending_hits = 292, Reservation_fails = 3913
L2_cache_bank[6]: Access = 3180, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4277
L2_cache_bank[7]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 306, Reservation_fails = 3642
L2_cache_bank[8]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 298, Reservation_fails = 3777
L2_cache_bank[9]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 308, Reservation_fails = 4112
L2_cache_bank[10]: Access = 3193, Miss = 176, Miss_rate = 0.055, Pending_hits = 290, Reservation_fails = 3935
L2_cache_bank[11]: Access = 3199, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4179
L2_total_cache_accesses = 39036
L2_total_cache_misses = 2117
L2_total_cache_miss_rate = 0.0542
L2_total_cache_pending_hits = 3621
L2_total_cache_reservation_fails = 49254
L2_total_cache_breakdown:
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 30369
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 3216
	L2_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 1408
	L2_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 48229
	L2_cache_stats_breakdown[CONST_ACC_R][HIT] = 116
	L2_cache_stats_breakdown[CONST_ACC_R][HIT_RESERVED] = 3
	L2_cache_stats_breakdown[CONST_ACC_R][MISS] = 1
	L2_cache_stats_breakdown[CONST_ACC_R][RESERVATION_FAIL] = 129
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT] = 2348
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT_RESERVED] = 391
	L2_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 704
	L2_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 552
	L2_cache_stats_breakdown[INST_ACC_R][HIT] = 465
	L2_cache_stats_breakdown[INST_ACC_R][HIT_RESERVED] = 11
	L2_cache_stats_breakdown[INST_ACC_R][MISS] = 4
	L2_cache_stats_breakdown[INST_ACC_R][RESERVATION_FAIL] = 344
L2_cache_data_port_util = 0.204
L2_cache_fill_port_util = 0.014

icnt_total_pkts_mem_to_simt=181168
icnt_total_pkts_simt_to_mem=45065
LD_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ST_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
----------------------------Interconnect-DETAILS--------------------------------
Class 0:
Packet latency average = 26.5755
	minimum = 6
	maximum = 856
Network latency average = 18.7063
	minimum = 6
	maximum = 797
Slowest packet = 1042
Flit latency average = 13.4921
	minimum = 6
	maximum = 797
Slowest flit = 2494
Fragmentation average = 0.0075187
	minimum = 0
	maximum = 332
Injected packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Accepted packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Injected flit rate average = 0.160782
	minimum = 0.0571056 (at node 1)
	maximum = 0.320317 (at node 15)
Accepted flit rate average= 0.160782
	minimum = 0.0703074 (at node 21)
	maximum = 0.233181 (at node 12)
Injected packet length average = 2.89775
Accepted packet length average = 2.89775
Total in-flight flits = 0 (0 measured)
====== Overall Traffic Statistics ======
====== Traffic class 0 ======
Packet latency average = 26.5755 (1 samples)
	minimum = 6 (1 samples)
	maximum = 856 (1 samples)
Network latency average = 18.7063 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Flit latency average = 13.4921 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Fragmentation average = 0.0075187 (1 samples)
	minimum = 0 (1 samples)
	maximum = 332 (1 samples)
Injected packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Accepted packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Injected flit rate average = 0.160782 (1 samples)
	minimum = 0.0571056 (1 samples)
	maximum = 0.320317 (1 samples)
Accepted flit rate average = 0.160782 (1 samples)
	minimum = 0.0703074 (1 samples)
	maximum = 0.233181 (1 samples)
Injected packet size average = 2.89775 (1 samples)
Accepted packet size average = 2.89775 (1 samples)
Hops average = 1 (1 samples)
----------------------------END-of-Interconnect-DETAILS-------------------------


gpgpu_simulation_time = 0 days, 0 hrs, 1 min, 58 sec (118 sec)
gpgpu_simulation_rate = 264279 (inst/sec)
gpgpu_simulation_rate = 441 (cycle/sec)
total time is 118079 ms


        *** GPGPU-Sim Simulator Version 3.2.2  [build 0] ***


GPGPU-Sim PTX: simulation mode 0 (can change with PTX_SIM_MODE_FUNC environment variable:
               1=functional simulation only, 0=detailed performance simulator)
GPGPU-Sim: Configuration options:

-network_mode                           1 # Interconnection network mode
-inter_config_file   config_fermi_islip.icnt # Interconnection network config file
-gpgpu_ptx_use_cuobjdump                    1 # Use cuobjdump to extract ptx and sass from binaries
-gpgpu_experimental_lib_support                    0 # Try to extract code from cuda libraries [Broken because of unknown cudaGetExportTable]
-gpgpu_ptx_convert_to_ptxplus                    0 # Convert SASS (native ISA) to ptxplus and run ptxplus
-gpgpu_ptx_force_max_capability                   20 # Force maximum compute capability
-gpgpu_ptx_inst_debug_to_file                    0 # Dump executed instructions' debug information to file
-gpgpu_ptx_inst_debug_file       inst_debug.txt # Executed instructions' debug output file
-gpgpu_ptx_inst_debug_thread_uid                    1 # Thread UID for executed instructions' debug output
-gpgpu_simd_model                       1 # 1 = post-dominator
-gpgpu_shader_core_pipeline              1536:32 # shader core pipeline config, i.e., {<nthread>:<warpsize>}
-gpgpu_tex_cache:l1  4:128:24,L:R:m:N:L,F:128:4,128:2 # per-shader L1 texture cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>:<rf>}
-gpgpu_const_cache:l1 64:64:2,L:R:f:N:L,A:2:32,4 # per-shader L1 constant memory cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:il1     4:128:4,L:R:f:N:L,A:2:32,4 # shader L1 instruction cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:dl1     32:128:4,L:L:m:N:H,A:32:8,8 # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PrefL1                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PreShared                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gmem_skip_L1D                          0 # global memory access skip L1D cache (implements -Xptxas -dlcm=cg, default=no skip)
-gpgpu_perfect_mem                      0 # enable perfect memory mode (no cache miss)
-n_regfile_gating_group                    4 # group of lanes that should be read/written together)
-gpgpu_clock_gated_reg_file                    0 # enable clock gated reg file for power calculations
-gpgpu_clock_gated_lanes                    0 # enable clock gated lanes for power calculations
-gpgpu_shader_registers                32768 # Number of registers per shader core. Limits number of concurrent CTAs. (default 8192)
-gpgpu_shader_cta                       8 # Maximum number of concurrent CTAs in shader (default 8)
-gpgpu_num_cta_barriers                   16 # Maximum number of named barriers per CTA (default 16)
-gpgpu_n_clusters                      15 # number of processing clusters
-gpgpu_n_cores_per_cluster                    8 # number of simd cores per cluster
-gpgpu_n_cluster_ejection_buffer_size                    8 # number of packets in ejection buffer
-gpgpu_n_ldst_response_buffer_size                    2 # number of response packets in ld/st unit ejection buffer
-gpgpu_shmem_size                   16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size                   49152 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefL1                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefShared                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_num_banks                   32 # Number of banks in the shared memory in each shader core (default 16)
-gpgpu_shmem_limited_broadcast                    0 # Limit shared memory to do one broadcast per cycle (default on)
-gpgpu_shmem_warp_parts                    1 # Number of portions a warp is divided into for shared memory bank conflict check 
-gpgpu_warpdistro_shader                   -1 # Specify which shader core to collect the warp size distribution from
-gpgpu_warp_issue_shader                    0 # Specify which shader core to collect the warp issue distribution from
-gpgpu_local_mem_map                    1 # Mapping from local memory space address to simulated GPU physical address space (default = enabled)
-gpgpu_num_reg_banks                   16 # Number of register banks (default = 8)
-gpgpu_reg_bank_use_warp_id                    0 # Use warp ID in mapping registers to banks (default = off)
-gpgpu_operand_collector_num_units_sp                    6 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_sfu                    8 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_mem                    2 # number of collector units (default = 2)
-gpgpu_operand_collector_num_units_gen                    0 # number of collector units (default = 0)
-gpgpu_operand_collector_num_in_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_operand_collector_num_out_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_coalesce_arch                   13 # Coalescing arch (default = 13, anything else is off for now)
-gpgpu_num_sched_per_core                    2 # Number of warp schedulers per core
-gpgpu_max_insn_issue_per_warp                    1 # Max number of instructions that can be issued per warp in one cycle by scheduler
-gpgpu_simt_core_sim_order                    1 # Select the simulation order of cores in a cluster (0=Fix, 1=Round-Robin)
-gpgpu_pipeline_widths        2,1,1,2,1,1,2 # Pipeline widths ID_OC_SP,ID_OC_SFU,ID_OC_MEM,OC_EX_SP,OC_EX_SFU,OC_EX_MEM,EX_WB
-gpgpu_num_sp_units                     2 # Number of SP units (default=1)
-gpgpu_num_sfu_units                    1 # Number of SF units (default=1)
-gpgpu_num_mem_units                    1 # Number if ldst units (default=1) WARNING: not hooked up to anything
-gpgpu_scheduler                      gto # Scheduler configuration: < lrr | gto | two_level_active > If two_level_active:<num_active_warps>:<inner_prioritization>:<outer_prioritization>For complete list of prioritization values see shader.h enum scheduler_prioritization_typeDefault: gto
-gpgpu_dram_scheduler                    1 # 0 = fifo, 1 = FR-FCFS (defaul)
-gpgpu_dram_partition_queues              8:8:8:8 # i2$:$2d:d2$:$2i
-l2_ideal                               0 # Use a ideal L2 cache that always hit
-gpgpu_cache:dl2     64:128:8,L:B:m:W:L,A:32:4,4:0,32 # unified banked L2 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>}
-gpgpu_cache:dl2_texture_only                    0 # L2 cache used for texture only
-gpgpu_n_mem                            6 # number of memory modules (e.g. memory controllers) in gpu
-gpgpu_n_sub_partition_per_mchannel                    2 # number of memory subpartition in each memory module
-gpgpu_n_mem_per_ctrlr                    2 # number of memory chips per memory controller
-gpgpu_memlatency_stat                   14 # track and display latency statistics 0x2 enables MC, 0x4 enables queue logs
-gpgpu_frfcfs_dram_sched_queue_size                   16 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_return_queue_size                  116 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_buswidth                    4 # default = 4 bytes (8 bytes per cycle at DDR)
-gpgpu_dram_burst_length                    8 # Burst length of each DRAM request (default = 4 data bus cycle)
-dram_data_command_freq_ratio                    4 # Frequency ratio between DRAM data bus and command bus (default = 2 times, i.e. DDR)
-gpgpu_dram_timing_opt nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40: CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2 # DRAM timing parameters = {nbk:tCCD:tRRD:tRCD:tRAS:tRP:tRC:CL:WL:tCDLR:tWR:nbkgrp:tCCDL:tRTPL}
-rop_latency                          120 # ROP queue latency (default 85)
-dram_latency                         100 # DRAM latency (default 30)
-gpgpu_mem_addr_mapping dramid@8;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS # mapping memory address to dram model {dramid@<start bit>;<memory address map>}
-gpgpu_mem_addr_test                    0 # run sweep test to check address mapping for aliased address
-gpgpu_mem_address_mask                    1 # 0 = old addressing mask, 1 = new addressing mask, 2 = new add. mask + flipped bank sel and chip sel bits
-gpuwattch_xml_file  gpuwattch_gtx480.xml # GPUWattch XML file
-power_simulation_enabled                    1 # Turn on power simulator (1=On, 0=Off)
-power_per_cycle_dump                    0 # Dump detailed power output each cycle
-power_trace_enabled                    0 # produce a file for the power trace (1=On, 0=Off)
-power_trace_zlevel                     6 # Compression level of the power trace output log (0=no comp, 9=highest)
-steady_power_levels_enabled                    0 # produce a file for the steady power levels (1=On, 0=Off)
-steady_state_definition                  8:4 # allowed deviation:number of samples
-gpgpu_max_cycle                        0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_insn                         0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_cta                          0 # terminates gpu simulation early (0 = no limit)
-gpgpu_runtime_stat                   500 # display runtime statistics such as dram utilization {<freq>:<flag>}
-liveness_message_freq                    1 # Minimum number of seconds between simulation liveness messages (0 = always print)
-gpgpu_flush_l1_cache                    0 # Flush L1 cache at the end of each kernel call
-gpgpu_flush_l2_cache                    0 # Flush L2 cache at the end of each kernel call
-gpgpu_deadlock_detect                    1 # Stop the simulation at deadlock (1=on (default), 0=off)
-gpgpu_ptx_instruction_classification                    0 # if enabled will classify ptx instruction types per kernel (Max 255 kernels now)
-gpgpu_ptx_sim_mode                     0 # Select between Performance (default) or Functional simulation (1)
-gpgpu_clock_domains 700.0:700.0:700.0:924.0 # Clock Domain Frequencies in MhZ {<Core Clock>:<ICNT Clock>:<L2 Clock>:<DRAM Clock>}
-gpgpu_max_concurrent_kernel                    8 # maximum kernels that can run concurrently on GPU
-gpgpu_cflog_interval                    0 # Interval between each snapshot in control flow logger
-visualizer_enabled                     0 # Turn on visualizer output (1=On, 0=Off)
-visualizer_outputfile                 NULL # Specifies the output log file for visualizer
-visualizer_zlevel                      6 # Compression level of the visualizer output log (0=no comp, 9=highest)
-trace_enabled                          0 # Turn on traces
-trace_components                    none # comma seperated list of traces to enable. Complete list found in trace_streams.tup. Default none
-trace_sampling_core                    0 # The core which is printed using CORE_DPRINTF. Default 0
-trace_sampling_memory_partition                   -1 # The memory partition which is printed using MEMPART_DPRINTF. Default -1 (i.e. all)
-enable_ptx_file_line_stats                    1 # Turn on PTX source line statistic profiling. (1 = On)
-ptx_line_stats_filename gpgpu_inst_stats.txt # Output file for PTX source line statistics.
-save_embedded_ptx                      0 # saves ptx files embedded in binary as <n>.ptx
-keep                                   0 # keep intermediate files created by GPGPU-Sim when interfacing with external programs
-gpgpu_ptx_save_converted_ptxplus                    0 # Saved converted ptxplus to a file
-ptx_opcode_latency_int         4,13,4,5,145 # Opcode latencies for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,19,25,145
-ptx_opcode_latency_fp          4,13,4,5,39 # Opcode latencies for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,30
-ptx_opcode_latency_dp         8,19,8,8,330 # Opcode latencies for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,335
-ptx_opcode_initiation_int            1,2,2,1,8 # Opcode initiation intervals for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,4,4,32
-ptx_opcode_initiation_fp            1,2,1,1,4 # Opcode initiation intervals for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,5
-ptx_opcode_initiation_dp         8,16,8,8,130 # Opcode initiation intervals for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,130
DRAM Timing Options:
nbk                                    16 # number of banks
CCD                                     2 # column to column delay
RRD                                     6 # minimal delay between activation of rows in different banks
RCD                                    12 # row to column delay
RAS                                    28 # time needed to activate row
RP                                     12 # time needed to precharge (deactivate) row
RC                                     40 # row cycle time
CDLR                                    5 # switching from write to read (changes tWTR)
WR                                     12 # last data-in to row precharge
CL                                     12 # CAS latency
WL                                      4 # Write latency
nbkgrp                                  4 # number of bank groups
CCDL                                    3 # column to column delay between accesses to different bank groups
RTPL                                    2 # read to precharge delay between accesses to different bank groups
Total number of memory sub partition = 12
addr_dec_mask[CHIP]  = 0000000000000000 	high:64 low:0
addr_dec_mask[BK]    = 000000000000e100 	high:16 low:8
addr_dec_mask[ROW]   = 000000000fff0000 	high:28 low:16
addr_dec_mask[COL]   = 0000000000001eff 	high:13 low:0
addr_dec_mask[BURST] = 000000000000003f 	high:6 low:0
sub_partition_id_mask = 0000000000000100
GPGPU-Sim uArch: clock freqs: 700000000.000000:700000000.000000:700000000.000000:924000000.000000
GPGPU-Sim uArch: clock periods: 0.00000000142857142857:0.00000000142857142857:0.00000000142857142857:0.00000000108225108225
*** Initializing Memory Statistics ***
GPGPU-Sim uArch: interconnect node map (shaderID+MemID to icntID)
GPGPU-Sim uArch: Memory nodes ID start from index: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
GPGPU-Sim uArch: interconnect node reverse map (icntID to shaderID+MemID)
GPGPU-Sim uArch: Memory nodes start from ID: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
8b51d2418a0658287a30fe3c4cc1fd21  /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
GPGPU-Sim uArch: performance model initialization complete.
GPGPU-Sim PTX: __cudaRegisterFatBinary, fat_cubin_handle = 1, filename=mm.cu
self exe links to: /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
Running md5sum using "md5sum /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM "
Running cuobjdump using "$CUDA_INSTALL_PATH/bin/cuobjdump -ptx -elf -sass /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM > _cuobjdump_complete_output_lPOvst"
Parsing file _cuobjdump_complete_output_lPOvst
######### cuobjdump parser ########
## Adding new section ELF
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_1.ptx
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section ELF
Adding arch: sm_20
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_2.ptx
Adding arch: sm_20
Adding identifier: mm.cu
Done parsing!!!
GPGPU-Sim PTX: __cudaRegisterFunction _Z14matrix_mul_gpuPiS_S_i : hostFun 0x0x400ce0, fat_cubin_handle = 1
GPGPU-Sim PTX: instruction assembly for function '_Z14matrix_mul_gpuPiS_S_i'...   done.
GPGPU-Sim PTX: finding reconvergence points for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: reconvergence points for _Z14matrix_mul_gpuPiS_S_i...
GPGPU-Sim PTX:  1 (potential) branch divergence @  PC=0x048 (_1.ptx:71) @%p1 bra $Lt_0_2306;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX:  2 (potential) branch divergence @  PC=0x130 (_1.ptx:103) @%p2 bra $Lt_0_1794;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:  3 (potential) branch divergence @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX: ... end of reconvergence points for _Z14matrix_mul_gpuPiS_S_i
GPGPU-Sim PTX: ... done pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'.
GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file _1.ptx
Adding _cuobjdump_2.ptx with cubin handle 1
GPGPU-Sim PTX: extracting embedded .ptx to temporary file "_ptx_pS0Cul"
Running: cat _ptx_pS0Cul | sed 's/.version 1.5/.version 1.4/' | sed 's/, texmode_independent//' | sed 's/\(\.extern \.const\[1\] .b8 \w\+\)\[\]/\1\[1\]/' | sed 's/const\[.\]/const\[0\]/g' > _ptx2_FeSKwd
GPGPU-Sim PTX: generating ptxinfo using "$CUDA_INSTALL_PATH/bin/ptxas --gpu-name=sm_20 -v _ptx2_FeSKwd --output-file  /dev/null 2> _ptx_pS0Culinfo"
GPGPU-Sim PTX: Kernel '_Z14matrix_mul_gpuPiS_S_i' : regs=14, lmem=0, smem=0, cmem=60
GPGPU-Sim PTX: removing ptxinfo using "rm -f _ptx_pS0Cul _ptx2_FeSKwd _ptx_pS0Culinfo"
GPGPU-Sim PTX: loading globals with explicit initializers... 
GPGPU-Sim PTX: finished loading globals (0 bytes total).
GPGPU-Sim PTX: loading constants with explicit initializers...  done.
Block(10,10)   Grid(15,15).

GPGPU-Sim PTX: cudaLaunch for 0x0x400ce0 (mode=performance simulation) on stream 0
GPGPU-Sim PTX: pushing kernel '_Z14matrix_mul_gpuPiS_S_i' to stream 0, gridDim= (15,15,1) blockDim = (10,10,1) 
kernel '_Z14matrix_mul_gpuPiS_S_i' transfer to GPU hardware scheduler
GPGPU-Sim uArch: Shader 8 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: CTA/core = 8, limited by: cta_limit
GPGPU-Sim uArch: core:  8, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 16 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 16, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 24 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 24, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 32 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 32, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 40 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 40, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 48 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 48, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 56 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 56, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 64 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 64, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 72 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 72, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 80 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 80, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 88 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 88, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 96 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 96, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 104 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:104, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 112 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:112, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 0 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  0, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 9 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  9, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 17 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 17, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 25 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 25, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 33 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 33, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 41 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 41, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 49 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 49, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 57 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 57, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 65 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 65, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 73 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 73, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 81 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 81, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 89 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 89, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 97 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 97, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 105 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:105, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 113 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:113, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 1 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  1, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 10 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 10, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 18 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 18, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 26 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 26, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 34 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 34, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 42 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 42, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 50 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 50, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 58 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 58, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 66 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 66, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 74 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 74, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 82 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 82, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 90 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 90, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 98 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 98, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 106 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:106, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 114 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:114, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 2 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  2, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 11 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 11, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 19 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 19, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 27 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 27, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 35 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 35, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 43 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 43, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 51 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 51, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 59 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 59, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 67 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 67, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 75 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 75, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 83 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 83, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 91 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 91, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 99 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 99, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 107 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:107, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 115 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:115, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 3 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  3, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 12 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 12, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 20 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 20, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 28 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 28, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 36 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 36, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 44 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 44, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 52 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 52, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 60 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 60, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 68 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 68, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 76 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 76, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 84 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 84, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 92 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 92, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 100 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:100, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 108 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:108, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 116 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:116, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 4 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  4, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 13 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 13, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 21 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 21, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 29 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 29, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 37 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 37, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 45 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 45, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 53 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 53, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 61 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 61, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 69 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 69, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 77 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 77, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 85 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 85, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 93 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 93, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 101 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:101, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 109 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:109, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 117 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:117, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 5 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  5, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 14 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 14, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 22 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 22, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 30 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 30, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 38 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 38, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 46 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 46, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 54 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 54, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 62 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 62, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 70 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 70, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 78 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 78, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 86 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 86, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 94 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 94, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 102 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:102, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 110 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:110, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 118 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:118, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 6 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  6, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 15 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 15, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 23 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 23, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 31 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 31, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 39 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 39, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 47 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 47, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 55 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 55, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 63 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 63, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 71 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 71, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 79 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 79, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 87 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 87, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 95 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 95, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 103 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:103, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 111 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:111, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 119 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:119, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 7 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  7, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: core:  8, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 16, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 24, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 32, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 40, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 48, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 56, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 64, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 72, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 80, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 88, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 96, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:104, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:112, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  0, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  9, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 17, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 25, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 33, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 41, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 49, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 57, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 65, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 73, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 81, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 89, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 97, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:105, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:113, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:  1, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 10, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 18, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 26, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 34, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 42, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 50, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 58, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 66, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 74, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 82, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 90, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 98, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:106, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:114, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:  2, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 11, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 19, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 27, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 35, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 43, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 51, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 59, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 67, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 75, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 83, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 91, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 99, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:107, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:115, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:  3, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 12, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 20, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 28, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 36, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 44, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 52, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 60, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 68, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 76, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 84, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 92, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:100, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:108, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:116, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:  4, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 13, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 21, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 29, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 37, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 45, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 53, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 61, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 69, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 77, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 85, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 93, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:101, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:109, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:117, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:  5, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 14, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 22, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 30, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 38, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 46, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 54, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 62, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 70, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 78, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 86, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 94, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:102, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:110, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:118, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:  6, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: cycles simulated: 500  inst.: 49456 (ipc=98.9) sim_rate=49456 (inst/sec) elapsed = 0:0:00:01 / Mon Jun 14 16:14:50 2021
GPGPU-Sim PTX: 100000 instructions simulated : ctaid=(2,12,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 1000  inst.: 155464 (ipc=155.5) sim_rate=77732 (inst/sec) elapsed = 0:0:00:02 / Mon Jun 14 16:14:51 2021
GPGPU-Sim PTX: 200000 instructions simulated : ctaid=(1,9,0) tid=(7,2,0)
GPGPU-Sim PTX: 300000 instructions simulated : ctaid=(13,12,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 1500  inst.: 294800 (ipc=196.5) sim_rate=98266 (inst/sec) elapsed = 0:0:00:03 / Mon Jun 14 16:14:52 2021
GPGPU-Sim PTX: 400000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 500000 instructions simulated : ctaid=(3,11,0) tid=(7,4,0)
GPGPU-Sim uArch: cycles simulated: 2000  inst.: 460980 (ipc=230.5) sim_rate=115245 (inst/sec) elapsed = 0:0:00:04 / Mon Jun 14 16:14:53 2021
GPGPU-Sim PTX: 600000 instructions simulated : ctaid=(0,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 700000 instructions simulated : ctaid=(3,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 2500  inst.: 658596 (ipc=263.4) sim_rate=131719 (inst/sec) elapsed = 0:0:00:05 / Mon Jun 14 16:14:54 2021
GPGPU-Sim uArch: cycles simulated: 3000  inst.: 686456 (ipc=228.8) sim_rate=114409 (inst/sec) elapsed = 0:0:00:06 / Mon Jun 14 16:14:55 2021
GPGPU-Sim uArch: cycles simulated: 3500  inst.: 722996 (ipc=206.6) sim_rate=103285 (inst/sec) elapsed = 0:0:00:07 / Mon Jun 14 16:14:56 2021
GPGPU-Sim PTX: 800000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 4000  inst.: 767180 (ipc=191.8) sim_rate=95897 (inst/sec) elapsed = 0:0:00:08 / Mon Jun 14 16:14:57 2021
GPGPU-Sim uArch: cycles simulated: 4500  inst.: 852268 (ipc=189.4) sim_rate=94696 (inst/sec) elapsed = 0:0:00:09 / Mon Jun 14 16:14:58 2021
GPGPU-Sim PTX: 900000 instructions simulated : ctaid=(4,10,0) tid=(9,1,0)
GPGPU-Sim PTX: 1000000 instructions simulated : ctaid=(9,0,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 5000  inst.: 1010580 (ipc=202.1) sim_rate=101058 (inst/sec) elapsed = 0:0:00:10 / Mon Jun 14 16:14:59 2021
GPGPU-Sim PTX: 1100000 instructions simulated : ctaid=(14,2,0) tid=(3,0,0)
GPGPU-Sim PTX: 1200000 instructions simulated : ctaid=(13,8,0) tid=(1,7,0)
GPGPU-Sim PTX: 1300000 instructions simulated : ctaid=(1,12,0) tid=(5,1,0)
GPGPU-Sim PTX: 1400000 instructions simulated : ctaid=(3,6,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 5500  inst.: 1387024 (ipc=252.2) sim_rate=126093 (inst/sec) elapsed = 0:0:00:11 / Mon Jun 14 16:15:00 2021
GPGPU-Sim PTX: 1500000 instructions simulated : ctaid=(10,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 1600000 instructions simulated : ctaid=(6,3,0) tid=(3,4,0)
GPGPU-Sim PTX: 1700000 instructions simulated : ctaid=(11,7,0) tid=(7,0,0)
GPGPU-Sim PTX: 1800000 instructions simulated : ctaid=(4,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 6000  inst.: 1834944 (ipc=305.8) sim_rate=152912 (inst/sec) elapsed = 0:0:00:12 / Mon Jun 14 16:15:01 2021
GPGPU-Sim PTX: 1900000 instructions simulated : ctaid=(10,0,0) tid=(5,3,0)
GPGPU-Sim PTX: 2000000 instructions simulated : ctaid=(12,3,0) tid=(9,5,0)
GPGPU-Sim PTX: 2100000 instructions simulated : ctaid=(3,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 2200000 instructions simulated : ctaid=(8,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 2300000 instructions simulated : ctaid=(12,7,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 6500  inst.: 2264888 (ipc=348.4) sim_rate=174222 (inst/sec) elapsed = 0:0:00:13 / Mon Jun 14 16:15:02 2021
GPGPU-Sim PTX: 2400000 instructions simulated : ctaid=(6,6,0) tid=(1,9,0)
GPGPU-Sim PTX: 2500000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 2600000 instructions simulated : ctaid=(4,12,0) tid=(3,4,0)
GPGPU-Sim PTX: 2700000 instructions simulated : ctaid=(12,1,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 7000  inst.: 2673128 (ipc=381.9) sim_rate=190937 (inst/sec) elapsed = 0:0:00:14 / Mon Jun 14 16:15:03 2021
GPGPU-Sim PTX: 2800000 instructions simulated : ctaid=(3,0,0) tid=(7,2,0)
GPGPU-Sim PTX: 2900000 instructions simulated : ctaid=(0,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 3000000 instructions simulated : ctaid=(5,4,0) tid=(1,3,0)
GPGPU-Sim uArch: cycles simulated: 7500  inst.: 3038248 (ipc=405.1) sim_rate=189890 (inst/sec) elapsed = 0:0:00:16 / Mon Jun 14 16:15:05 2021
GPGPU-Sim PTX: 3100000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3200000 instructions simulated : ctaid=(12,11,0) tid=(7,8,0)
GPGPU-Sim PTX: 3300000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3400000 instructions simulated : ctaid=(4,12,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 8000  inst.: 3416336 (ipc=427.0) sim_rate=200960 (inst/sec) elapsed = 0:0:00:17 / Mon Jun 14 16:15:06 2021
GPGPU-Sim PTX: 3500000 instructions simulated : ctaid=(11,13,0) tid=(1,9,0)
GPGPU-Sim PTX: 3600000 instructions simulated : ctaid=(10,12,0) tid=(9,1,0)
GPGPU-Sim PTX: 3700000 instructions simulated : ctaid=(4,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 3800000 instructions simulated : ctaid=(13,11,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 8500  inst.: 3794960 (ipc=446.5) sim_rate=210831 (inst/sec) elapsed = 0:0:00:18 / Mon Jun 14 16:15:07 2021
GPGPU-Sim PTX: 3900000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 4000000 instructions simulated : ctaid=(2,8,0) tid=(1,3,0)
GPGPU-Sim PTX: 4100000 instructions simulated : ctaid=(8,14,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 9000  inst.: 4145552 (ipc=460.6) sim_rate=218186 (inst/sec) elapsed = 0:0:00:19 / Mon Jun 14 16:15:08 2021
GPGPU-Sim PTX: 4200000 instructions simulated : ctaid=(13,11,0) tid=(3,8,0)
GPGPU-Sim PTX: 4300000 instructions simulated : ctaid=(9,1,0) tid=(3,2,0)
GPGPU-Sim PTX: 4400000 instructions simulated : ctaid=(7,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 4500000 instructions simulated : ctaid=(4,6,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 9500  inst.: 4494464 (ipc=473.1) sim_rate=224723 (inst/sec) elapsed = 0:0:00:20 / Mon Jun 14 16:15:09 2021
GPGPU-Sim PTX: 4600000 instructions simulated : ctaid=(10,0,0) tid=(9,9,0)
GPGPU-Sim PTX: 4700000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 4800000 instructions simulated : ctaid=(11,7,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 10000  inst.: 4846372 (ipc=484.6) sim_rate=220289 (inst/sec) elapsed = 0:0:00:22 / Mon Jun 14 16:15:11 2021
GPGPU-Sim PTX: 4900000 instructions simulated : ctaid=(3,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 5000000 instructions simulated : ctaid=(10,14,0) tid=(5,9,0)
GPGPU-Sim PTX: 5100000 instructions simulated : ctaid=(8,10,0) tid=(1,3,0)
GPGPU-Sim PTX: 5200000 instructions simulated : ctaid=(11,1,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 10500  inst.: 5179948 (ipc=493.3) sim_rate=225215 (inst/sec) elapsed = 0:0:00:23 / Mon Jun 14 16:15:12 2021
GPGPU-Sim PTX: 5300000 instructions simulated : ctaid=(7,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 5400000 instructions simulated : ctaid=(9,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 5500000 instructions simulated : ctaid=(1,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 11000  inst.: 5542924 (ipc=503.9) sim_rate=230955 (inst/sec) elapsed = 0:0:00:24 / Mon Jun 14 16:15:13 2021
GPGPU-Sim PTX: 5600000 instructions simulated : ctaid=(3,8,0) tid=(7,8,0)
GPGPU-Sim PTX: 5700000 instructions simulated : ctaid=(8,0,0) tid=(3,8,0)
GPGPU-Sim PTX: 5800000 instructions simulated : ctaid=(2,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 5900000 instructions simulated : ctaid=(4,12,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 11500  inst.: 5863464 (ipc=509.9) sim_rate=234538 (inst/sec) elapsed = 0:0:00:25 / Mon Jun 14 16:15:14 2021
GPGPU-Sim PTX: 6000000 instructions simulated : ctaid=(10,4,0) tid=(7,4,0)
GPGPU-Sim PTX: 6100000 instructions simulated : ctaid=(7,9,0) tid=(5,7,0)
GPGPU-Sim PTX: 6200000 instructions simulated : ctaid=(9,7,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 12000  inst.: 6198952 (ipc=516.6) sim_rate=238421 (inst/sec) elapsed = 0:0:00:26 / Mon Jun 14 16:15:15 2021
GPGPU-Sim PTX: 6300000 instructions simulated : ctaid=(0,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 6400000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 6500000 instructions simulated : ctaid=(6,10,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 12500  inst.: 6514828 (ipc=521.2) sim_rate=241289 (inst/sec) elapsed = 0:0:00:27 / Mon Jun 14 16:15:16 2021
GPGPU-Sim PTX: 6600000 instructions simulated : ctaid=(9,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 6700000 instructions simulated : ctaid=(7,2,0) tid=(5,9,0)
GPGPU-Sim PTX: 6800000 instructions simulated : ctaid=(1,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 6900000 instructions simulated : ctaid=(9,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 13000  inst.: 6850760 (ipc=527.0) sim_rate=244670 (inst/sec) elapsed = 0:0:00:28 / Mon Jun 14 16:15:17 2021
GPGPU-Sim PTX: 7000000 instructions simulated : ctaid=(5,1,0) tid=(1,7,0)
GPGPU-Sim PTX: 7100000 instructions simulated : ctaid=(13,0,0) tid=(5,1,0)
GPGPU-Sim PTX: 7200000 instructions simulated : ctaid=(10,11,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 13500  inst.: 7177796 (ipc=531.7) sim_rate=239259 (inst/sec) elapsed = 0:0:00:30 / Mon Jun 14 16:15:19 2021
GPGPU-Sim PTX: 7300000 instructions simulated : ctaid=(3,5,0) tid=(3,4,0)
GPGPU-Sim PTX: 7400000 instructions simulated : ctaid=(1,12,0) tid=(3,0,0)
GPGPU-Sim PTX: 7500000 instructions simulated : ctaid=(2,12,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 14000  inst.: 7513232 (ipc=536.7) sim_rate=242362 (inst/sec) elapsed = 0:0:00:31 / Mon Jun 14 16:15:20 2021
GPGPU-Sim PTX: 7600000 instructions simulated : ctaid=(12,4,0) tid=(3,8,0)
GPGPU-Sim PTX: 7700000 instructions simulated : ctaid=(5,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 7800000 instructions simulated : ctaid=(10,0,0) tid=(7,4,0)
GPGPU-Sim PTX: 7900000 instructions simulated : ctaid=(11,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 14500  inst.: 7861928 (ipc=542.2) sim_rate=245685 (inst/sec) elapsed = 0:0:00:32 / Mon Jun 14 16:15:21 2021
GPGPU-Sim PTX: 8000000 instructions simulated : ctaid=(13,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 8100000 instructions simulated : ctaid=(8,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 8200000 instructions simulated : ctaid=(9,13,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 15000  inst.: 8177372 (ipc=545.2) sim_rate=247799 (inst/sec) elapsed = 0:0:00:33 / Mon Jun 14 16:15:22 2021
GPGPU-Sim PTX: 8300000 instructions simulated : ctaid=(4,4,0) tid=(3,0,0)
GPGPU-Sim PTX: 8400000 instructions simulated : ctaid=(10,10,0) tid=(1,7,0)
GPGPU-Sim PTX: 8500000 instructions simulated : ctaid=(0,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 15500  inst.: 8534132 (ipc=550.6) sim_rate=251003 (inst/sec) elapsed = 0:0:00:34 / Mon Jun 14 16:15:23 2021
GPGPU-Sim PTX: 8600000 instructions simulated : ctaid=(12,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 8700000 instructions simulated : ctaid=(4,3,0) tid=(1,3,0)
GPGPU-Sim PTX: 8800000 instructions simulated : ctaid=(9,14,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 16000  inst.: 8846628 (ipc=552.9) sim_rate=252760 (inst/sec) elapsed = 0:0:00:35 / Mon Jun 14 16:15:24 2021
GPGPU-Sim PTX: 8900000 instructions simulated : ctaid=(0,1,0) tid=(7,4,0)
GPGPU-Sim PTX: 9000000 instructions simulated : ctaid=(10,11,0) tid=(1,7,0)
GPGPU-Sim PTX: 9100000 instructions simulated : ctaid=(12,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 9200000 instructions simulated : ctaid=(10,7,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 16500  inst.: 9193752 (ipc=557.2) sim_rate=248479 (inst/sec) elapsed = 0:0:00:37 / Mon Jun 14 16:15:26 2021
GPGPU-Sim PTX: 9300000 instructions simulated : ctaid=(10,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 9400000 instructions simulated : ctaid=(2,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 9500000 instructions simulated : ctaid=(3,9,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 17000  inst.: 9519480 (ipc=560.0) sim_rate=250512 (inst/sec) elapsed = 0:0:00:38 / Mon Jun 14 16:15:27 2021
GPGPU-Sim PTX: 9600000 instructions simulated : ctaid=(12,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 9700000 instructions simulated : ctaid=(1,3,0) tid=(7,0,0)
GPGPU-Sim PTX: 9800000 instructions simulated : ctaid=(7,0,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 17500  inst.: 9845216 (ipc=562.6) sim_rate=252441 (inst/sec) elapsed = 0:0:00:39 / Mon Jun 14 16:15:28 2021
GPGPU-Sim PTX: 9900000 instructions simulated : ctaid=(1,6,0) tid=(3,2,0)
GPGPU-Sim PTX: 10000000 instructions simulated : ctaid=(10,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 10100000 instructions simulated : ctaid=(10,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 10200000 instructions simulated : ctaid=(2,4,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 18000  inst.: 10175904 (ipc=565.3) sim_rate=254397 (inst/sec) elapsed = 0:0:00:40 / Mon Jun 14 16:15:29 2021
GPGPU-Sim PTX: 10300000 instructions simulated : ctaid=(10,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 10400000 instructions simulated : ctaid=(8,8,0) tid=(9,9,0)
GPGPU-Sim PTX: 10500000 instructions simulated : ctaid=(13,8,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 18500  inst.: 10526504 (ipc=569.0) sim_rate=256744 (inst/sec) elapsed = 0:0:00:41 / Mon Jun 14 16:15:30 2021
GPGPU-Sim PTX: 10600000 instructions simulated : ctaid=(13,12,0) tid=(5,7,0)
GPGPU-Sim PTX: 10700000 instructions simulated : ctaid=(14,2,0) tid=(7,4,0)
GPGPU-Sim PTX: 10800000 instructions simulated : ctaid=(9,1,0) tid=(3,6,0)
GPGPU-Sim PTX: 10900000 instructions simulated : ctaid=(14,6,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 19000  inst.: 10861260 (ipc=571.6) sim_rate=258601 (inst/sec) elapsed = 0:0:00:42 / Mon Jun 14 16:15:31 2021
GPGPU-Sim PTX: 11000000 instructions simulated : ctaid=(3,13,0) tid=(1,3,0)
GPGPU-Sim PTX: 11100000 instructions simulated : ctaid=(13,11,0) tid=(7,6,0)
GPGPU-Sim PTX: 11200000 instructions simulated : ctaid=(11,6,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 19500  inst.: 11182232 (ipc=573.4) sim_rate=260051 (inst/sec) elapsed = 0:0:00:43 / Mon Jun 14 16:15:32 2021
GPGPU-Sim PTX: 11300000 instructions simulated : ctaid=(7,7,0) tid=(5,9,0)
GPGPU-Sim PTX: 11400000 instructions simulated : ctaid=(9,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 11500000 instructions simulated : ctaid=(14,10,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 20000  inst.: 11510844 (ipc=575.5) sim_rate=255796 (inst/sec) elapsed = 0:0:00:45 / Mon Jun 14 16:15:34 2021
GPGPU-Sim PTX: 11600000 instructions simulated : ctaid=(2,13,0) tid=(3,0,0)
GPGPU-Sim PTX: 11700000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 11800000 instructions simulated : ctaid=(1,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 20500  inst.: 11845228 (ipc=577.8) sim_rate=257504 (inst/sec) elapsed = 0:0:00:46 / Mon Jun 14 16:15:35 2021
GPGPU-Sim PTX: 11900000 instructions simulated : ctaid=(12,9,0) tid=(5,1,0)
GPGPU-Sim PTX: 12000000 instructions simulated : ctaid=(9,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 12100000 instructions simulated : ctaid=(10,1,0) tid=(7,6,0)
GPGPU-Sim PTX: 12200000 instructions simulated : ctaid=(12,2,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 21000  inst.: 12183192 (ipc=580.2) sim_rate=259216 (inst/sec) elapsed = 0:0:00:47 / Mon Jun 14 16:15:36 2021
GPGPU-Sim PTX: 12300000 instructions simulated : ctaid=(7,7,0) tid=(7,6,0)
GPGPU-Sim PTX: 12400000 instructions simulated : ctaid=(9,1,0) tid=(3,0,0)
GPGPU-Sim PTX: 12500000 instructions simulated : ctaid=(14,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 21500  inst.: 12511912 (ipc=581.9) sim_rate=260664 (inst/sec) elapsed = 0:0:00:48 / Mon Jun 14 16:15:37 2021
GPGPU-Sim PTX: 12600000 instructions simulated : ctaid=(13,7,0) tid=(9,9,0)
GPGPU-Sim PTX: 12700000 instructions simulated : ctaid=(0,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 12800000 instructions simulated : ctaid=(5,9,0) tid=(1,9,0)
GPGPU-Sim PTX: 12900000 instructions simulated : ctaid=(5,13,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 22000  inst.: 12861064 (ipc=584.6) sim_rate=262470 (inst/sec) elapsed = 0:0:00:49 / Mon Jun 14 16:15:38 2021
GPGPU-Sim PTX: 13000000 instructions simulated : ctaid=(14,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 13100000 instructions simulated : ctaid=(1,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 13200000 instructions simulated : ctaid=(11,12,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 22500  inst.: 13200500 (ipc=586.7) sim_rate=264010 (inst/sec) elapsed = 0:0:00:50 / Mon Jun 14 16:15:39 2021
GPGPU-Sim PTX: 13300000 instructions simulated : ctaid=(11,1,0) tid=(7,0,0)
GPGPU-Sim PTX: 13400000 instructions simulated : ctaid=(11,4,0) tid=(5,9,0)
GPGPU-Sim PTX: 13500000 instructions simulated : ctaid=(3,13,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 23000  inst.: 13520992 (ipc=587.9) sim_rate=265117 (inst/sec) elapsed = 0:0:00:51 / Mon Jun 14 16:15:40 2021
GPGPU-Sim PTX: 13600000 instructions simulated : ctaid=(5,12,0) tid=(9,5,0)
GPGPU-Sim PTX: 13700000 instructions simulated : ctaid=(3,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 13800000 instructions simulated : ctaid=(1,2,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 23500  inst.: 13840084 (ipc=588.9) sim_rate=266155 (inst/sec) elapsed = 0:0:00:52 / Mon Jun 14 16:15:41 2021
GPGPU-Sim PTX: 13900000 instructions simulated : ctaid=(11,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 14000000 instructions simulated : ctaid=(7,6,0) tid=(5,1,0)
GPGPU-Sim PTX: 14100000 instructions simulated : ctaid=(9,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 14200000 instructions simulated : ctaid=(14,8,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 24000  inst.: 14188812 (ipc=591.2) sim_rate=262755 (inst/sec) elapsed = 0:0:00:54 / Mon Jun 14 16:15:43 2021
GPGPU-Sim PTX: 14300000 instructions simulated : ctaid=(10,11,0) tid=(7,0,0)
GPGPU-Sim PTX: 14400000 instructions simulated : ctaid=(13,13,0) tid=(3,2,0)
GPGPU-Sim PTX: 14500000 instructions simulated : ctaid=(3,10,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 24500  inst.: 14517376 (ipc=592.5) sim_rate=263952 (inst/sec) elapsed = 0:0:00:55 / Mon Jun 14 16:15:44 2021
GPGPU-Sim PTX: 14600000 instructions simulated : ctaid=(12,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 14700000 instructions simulated : ctaid=(7,7,0) tid=(5,7,0)
GPGPU-Sim PTX: 14800000 instructions simulated : ctaid=(13,8,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 25000  inst.: 14846428 (ipc=593.9) sim_rate=265114 (inst/sec) elapsed = 0:0:00:56 / Mon Jun 14 16:15:45 2021
GPGPU-Sim PTX: 14900000 instructions simulated : ctaid=(9,5,0) tid=(9,9,0)
GPGPU-Sim PTX: 15000000 instructions simulated : ctaid=(2,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 15100000 instructions simulated : ctaid=(8,13,0) tid=(9,5,0)
GPGPU-Sim PTX: 15200000 instructions simulated : ctaid=(4,12,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 25500  inst.: 15182476 (ipc=595.4) sim_rate=266359 (inst/sec) elapsed = 0:0:00:57 / Mon Jun 14 16:15:46 2021
GPGPU-Sim PTX: 15300000 instructions simulated : ctaid=(6,12,0) tid=(3,2,0)
GPGPU-Sim PTX: 15400000 instructions simulated : ctaid=(13,0,0) tid=(1,7,0)
GPGPU-Sim PTX: 15500000 instructions simulated : ctaid=(10,1,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 26000  inst.: 15506780 (ipc=596.4) sim_rate=267358 (inst/sec) elapsed = 0:0:00:58 / Mon Jun 14 16:15:47 2021
GPGPU-Sim PTX: 15600000 instructions simulated : ctaid=(2,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 15700000 instructions simulated : ctaid=(5,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 15800000 instructions simulated : ctaid=(2,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 15900000 instructions simulated : ctaid=(6,5,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 26500  inst.: 15853292 (ipc=598.2) sim_rate=268699 (inst/sec) elapsed = 0:0:00:59 / Mon Jun 14 16:15:48 2021
GPGPU-Sim PTX: 16000000 instructions simulated : ctaid=(13,2,0) tid=(7,2,0)
GPGPU-Sim PTX: 16100000 instructions simulated : ctaid=(14,0,0) tid=(1,9,0)
GPGPU-Sim PTX: 16200000 instructions simulated : ctaid=(12,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 27000  inst.: 16182996 (ipc=599.4) sim_rate=269716 (inst/sec) elapsed = 0:0:01:00 / Mon Jun 14 16:15:49 2021
GPGPU-Sim PTX: 16300000 instructions simulated : ctaid=(5,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 16400000 instructions simulated : ctaid=(13,10,0) tid=(7,0,0)
GPGPU-Sim PTX: 16500000 instructions simulated : ctaid=(14,13,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 27500  inst.: 16529016 (ipc=601.1) sim_rate=266597 (inst/sec) elapsed = 0:0:01:02 / Mon Jun 14 16:15:51 2021
GPGPU-Sim PTX: 16600000 instructions simulated : ctaid=(11,10,0) tid=(3,8,0)
GPGPU-Sim PTX: 16700000 instructions simulated : ctaid=(12,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 16800000 instructions simulated : ctaid=(1,12,0) tid=(7,8,0)
GPGPU-Sim PTX: 16900000 instructions simulated : ctaid=(9,11,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 28000  inst.: 16859168 (ipc=602.1) sim_rate=267605 (inst/sec) elapsed = 0:0:01:03 / Mon Jun 14 16:15:52 2021
GPGPU-Sim PTX: 17000000 instructions simulated : ctaid=(5,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 17100000 instructions simulated : ctaid=(7,11,0) tid=(1,9,0)
GPGPU-Sim PTX: 17200000 instructions simulated : ctaid=(6,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 28500  inst.: 17206480 (ipc=603.7) sim_rate=268851 (inst/sec) elapsed = 0:0:01:04 / Mon Jun 14 16:15:53 2021
GPGPU-Sim PTX: 17300000 instructions simulated : ctaid=(14,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 17400000 instructions simulated : ctaid=(1,11,0) tid=(9,1,0)
GPGPU-Sim PTX: 17500000 instructions simulated : ctaid=(9,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 29000  inst.: 17530396 (ipc=604.5) sim_rate=269698 (inst/sec) elapsed = 0:0:01:05 / Mon Jun 14 16:15:54 2021
GPGPU-Sim PTX: 17600000 instructions simulated : ctaid=(8,5,0) tid=(9,7,0)
GPGPU-Sim PTX: 17700000 instructions simulated : ctaid=(9,2,0) tid=(1,3,0)
GPGPU-Sim PTX: 17800000 instructions simulated : ctaid=(11,11,0) tid=(5,9,0)
GPGPU-Sim PTX: 17900000 instructions simulated : ctaid=(14,8,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 29500  inst.: 17869380 (ipc=605.7) sim_rate=270748 (inst/sec) elapsed = 0:0:01:06 / Mon Jun 14 16:15:55 2021
GPGPU-Sim PTX: 18000000 instructions simulated : ctaid=(11,8,0) tid=(1,1,0)
GPGPU-Sim PTX: 18100000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 18200000 instructions simulated : ctaid=(12,13,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 30000  inst.: 18174448 (ipc=605.8) sim_rate=271260 (inst/sec) elapsed = 0:0:01:07 / Mon Jun 14 16:15:56 2021
GPGPU-Sim PTX: 18300000 instructions simulated : ctaid=(4,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 18400000 instructions simulated : ctaid=(1,14,0) tid=(9,1,0)
GPGPU-Sim PTX: 18500000 instructions simulated : ctaid=(0,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 30500  inst.: 18511332 (ipc=606.9) sim_rate=272225 (inst/sec) elapsed = 0:0:01:08 / Mon Jun 14 16:15:57 2021
GPGPU-Sim PTX: 18600000 instructions simulated : ctaid=(11,5,0) tid=(7,6,0)
GPGPU-Sim PTX: 18700000 instructions simulated : ctaid=(5,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 18800000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 18900000 instructions simulated : ctaid=(1,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 31000  inst.: 18869068 (ipc=608.7) sim_rate=269558 (inst/sec) elapsed = 0:0:01:10 / Mon Jun 14 16:15:59 2021
GPGPU-Sim PTX: 19000000 instructions simulated : ctaid=(11,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 19100000 instructions simulated : ctaid=(1,4,0) tid=(7,8,0)
GPGPU-Sim PTX: 19200000 instructions simulated : ctaid=(4,9,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 31500  inst.: 19201680 (ipc=609.6) sim_rate=270446 (inst/sec) elapsed = 0:0:01:11 / Mon Jun 14 16:16:00 2021
GPGPU-Sim PTX: 19300000 instructions simulated : ctaid=(6,13,0) tid=(9,9,0)
GPGPU-Sim PTX: 19400000 instructions simulated : ctaid=(11,0,0) tid=(9,1,0)
GPGPU-Sim PTX: 19500000 instructions simulated : ctaid=(9,2,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 32000  inst.: 19534468 (ipc=610.5) sim_rate=271312 (inst/sec) elapsed = 0:0:01:12 / Mon Jun 14 16:16:01 2021
GPGPU-Sim PTX: 19600000 instructions simulated : ctaid=(8,2,0) tid=(1,5,0)
GPGPU-Sim PTX: 19700000 instructions simulated : ctaid=(11,14,0) tid=(7,6,0)
GPGPU-Sim PTX: 19800000 instructions simulated : ctaid=(12,2,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 32500  inst.: 19828596 (ipc=610.1) sim_rate=271624 (inst/sec) elapsed = 0:0:01:13 / Mon Jun 14 16:16:02 2021
GPGPU-Sim PTX: 19900000 instructions simulated : ctaid=(4,3,0) tid=(3,6,0)
GPGPU-Sim PTX: 20000000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 20100000 instructions simulated : ctaid=(1,1,0) tid=(1,3,0)
GPGPU-Sim PTX: 20200000 instructions simulated : ctaid=(4,2,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33000  inst.: 20172780 (ipc=611.3) sim_rate=272605 (inst/sec) elapsed = 0:0:01:14 / Mon Jun 14 16:16:03 2021
GPGPU-Sim PTX: 20300000 instructions simulated : ctaid=(9,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 20400000 instructions simulated : ctaid=(10,10,0) tid=(9,5,0)
GPGPU-Sim PTX: 20500000 instructions simulated : ctaid=(9,4,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33500  inst.: 20511528 (ipc=612.3) sim_rate=273487 (inst/sec) elapsed = 0:0:01:15 / Mon Jun 14 16:16:04 2021
GPGPU-Sim PTX: 20600000 instructions simulated : ctaid=(6,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 20700000 instructions simulated : ctaid=(12,12,0) tid=(7,2,0)
GPGPU-Sim PTX: 20800000 instructions simulated : ctaid=(8,4,0) tid=(9,7,0)
GPGPU-Sim PTX: 20900000 instructions simulated : ctaid=(8,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 34000  inst.: 20866696 (ipc=613.7) sim_rate=274561 (inst/sec) elapsed = 0:0:01:16 / Mon Jun 14 16:16:05 2021
GPGPU-Sim PTX: 21000000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 21100000 instructions simulated : ctaid=(7,2,0) tid=(9,1,0)
GPGPU-Sim PTX: 21200000 instructions simulated : ctaid=(1,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 34500  inst.: 21196192 (ipc=614.4) sim_rate=275275 (inst/sec) elapsed = 0:0:01:17 / Mon Jun 14 16:16:06 2021
GPGPU-Sim PTX: 21300000 instructions simulated : ctaid=(6,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 21400000 instructions simulated : ctaid=(4,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 21500000 instructions simulated : ctaid=(2,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 35000  inst.: 21528812 (ipc=615.1) sim_rate=272516 (inst/sec) elapsed = 0:0:01:19 / Mon Jun 14 16:16:08 2021
GPGPU-Sim PTX: 21600000 instructions simulated : ctaid=(12,14,0) tid=(1,3,0)
GPGPU-Sim PTX: 21700000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim PTX: 21800000 instructions simulated : ctaid=(5,14,0) tid=(3,0,0)
GPGPU-Sim PTX: 21900000 instructions simulated : ctaid=(12,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 35500  inst.: 21850644 (ipc=615.5) sim_rate=273133 (inst/sec) elapsed = 0:0:01:20 / Mon Jun 14 16:16:09 2021
GPGPU-Sim PTX: 22000000 instructions simulated : ctaid=(5,2,0) tid=(9,9,0)
GPGPU-Sim PTX: 22100000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 22200000 instructions simulated : ctaid=(11,9,0) tid=(7,4,0)
GPGPU-Sim uArch: cycles simulated: 36000  inst.: 22180956 (ipc=616.1) sim_rate=273838 (inst/sec) elapsed = 0:0:01:21 / Mon Jun 14 16:16:10 2021
GPGPU-Sim PTX: 22300000 instructions simulated : ctaid=(2,1,0) tid=(5,3,0)
GPGPU-Sim PTX: 22400000 instructions simulated : ctaid=(1,5,0) tid=(5,3,0)
GPGPU-Sim PTX: 22500000 instructions simulated : ctaid=(2,4,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 36500  inst.: 22532600 (ipc=617.3) sim_rate=274787 (inst/sec) elapsed = 0:0:01:22 / Mon Jun 14 16:16:11 2021
GPGPU-Sim PTX: 22600000 instructions simulated : ctaid=(14,9,0) tid=(1,7,0)
GPGPU-Sim PTX: 22700000 instructions simulated : ctaid=(2,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 22800000 instructions simulated : ctaid=(5,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 22900000 instructions simulated : ctaid=(4,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 37000  inst.: 22872968 (ipc=618.2) sim_rate=275577 (inst/sec) elapsed = 0:0:01:23 / Mon Jun 14 16:16:12 2021
GPGPU-Sim PTX: 23000000 instructions simulated : ctaid=(13,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 23100000 instructions simulated : ctaid=(6,3,0) tid=(3,0,0)
GPGPU-Sim PTX: 23200000 instructions simulated : ctaid=(3,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 37500  inst.: 23195320 (ipc=618.5) sim_rate=276134 (inst/sec) elapsed = 0:0:01:24 / Mon Jun 14 16:16:13 2021
GPGPU-Sim PTX: 23300000 instructions simulated : ctaid=(8,6,0) tid=(9,1,0)
GPGPU-Sim PTX: 23400000 instructions simulated : ctaid=(1,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 23500000 instructions simulated : ctaid=(0,10,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 38000  inst.: 23505996 (ipc=618.6) sim_rate=276541 (inst/sec) elapsed = 0:0:01:25 / Mon Jun 14 16:16:14 2021
GPGPU-Sim PTX: 23600000 instructions simulated : ctaid=(5,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 23700000 instructions simulated : ctaid=(13,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 23800000 instructions simulated : ctaid=(1,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 38500  inst.: 23837924 (ipc=619.2) sim_rate=277185 (inst/sec) elapsed = 0:0:01:26 / Mon Jun 14 16:16:15 2021
GPGPU-Sim PTX: 23900000 instructions simulated : ctaid=(4,3,0) tid=(5,1,0)
GPGPU-Sim PTX: 24000000 instructions simulated : ctaid=(13,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24100000 instructions simulated : ctaid=(6,1,0) tid=(9,7,0)
GPGPU-Sim PTX: 24200000 instructions simulated : ctaid=(5,11,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 39000  inst.: 24174400 (ipc=619.9) sim_rate=274709 (inst/sec) elapsed = 0:0:01:28 / Mon Jun 14 16:16:17 2021
GPGPU-Sim PTX: 24300000 instructions simulated : ctaid=(13,6,0) tid=(7,2,0)
GPGPU-Sim PTX: 24400000 instructions simulated : ctaid=(5,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 24500000 instructions simulated : ctaid=(10,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 39500  inst.: 24517452 (ipc=620.7) sim_rate=275476 (inst/sec) elapsed = 0:0:01:29 / Mon Jun 14 16:16:18 2021
GPGPU-Sim PTX: 24600000 instructions simulated : ctaid=(7,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24700000 instructions simulated : ctaid=(0,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 24800000 instructions simulated : ctaid=(8,8,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 40000  inst.: 24838280 (ipc=621.0) sim_rate=275980 (inst/sec) elapsed = 0:0:01:30 / Mon Jun 14 16:16:19 2021
GPGPU-Sim PTX: 24900000 instructions simulated : ctaid=(13,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 25000000 instructions simulated : ctaid=(10,0,0) tid=(3,4,0)
GPGPU-Sim PTX: 25100000 instructions simulated : ctaid=(4,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 25200000 instructions simulated : ctaid=(11,13,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 40500  inst.: 25182424 (ipc=621.8) sim_rate=276729 (inst/sec) elapsed = 0:0:01:31 / Mon Jun 14 16:16:20 2021
GPGPU-Sim PTX: 25300000 instructions simulated : ctaid=(14,12,0) tid=(1,1,0)
GPGPU-Sim PTX: 25400000 instructions simulated : ctaid=(4,3,0) tid=(1,1,0)
GPGPU-Sim PTX: 25500000 instructions simulated : ctaid=(6,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 41000  inst.: 25501984 (ipc=622.0) sim_rate=277195 (inst/sec) elapsed = 0:0:01:32 / Mon Jun 14 16:16:21 2021
GPGPU-Sim PTX: 25600000 instructions simulated : ctaid=(14,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 25700000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 25800000 instructions simulated : ctaid=(1,8,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 41500  inst.: 25839444 (ipc=622.6) sim_rate=277843 (inst/sec) elapsed = 0:0:01:33 / Mon Jun 14 16:16:22 2021
GPGPU-Sim PTX: 25900000 instructions simulated : ctaid=(13,4,0) tid=(7,0,0)
GPGPU-Sim PTX: 26000000 instructions simulated : ctaid=(12,14,0) tid=(5,5,0)
GPGPU-Sim PTX: 26100000 instructions simulated : ctaid=(8,14,0) tid=(1,7,0)
GPGPU-Sim PTX: 26200000 instructions simulated : ctaid=(8,0,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 42000  inst.: 26158016 (ipc=622.8) sim_rate=278276 (inst/sec) elapsed = 0:0:01:34 / Mon Jun 14 16:16:23 2021
GPGPU-Sim PTX: 26300000 instructions simulated : ctaid=(6,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 26400000 instructions simulated : ctaid=(4,6,0) tid=(9,9,0)
GPGPU-Sim PTX: 26500000 instructions simulated : ctaid=(7,14,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 42500  inst.: 26488808 (ipc=623.3) sim_rate=278829 (inst/sec) elapsed = 0:0:01:35 / Mon Jun 14 16:16:24 2021
GPGPU-Sim PTX: 26600000 instructions simulated : ctaid=(0,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 26700000 instructions simulated : ctaid=(13,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 26800000 instructions simulated : ctaid=(14,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 43000  inst.: 26799472 (ipc=623.2) sim_rate=279161 (inst/sec) elapsed = 0:0:01:36 / Mon Jun 14 16:16:25 2021
GPGPU-Sim PTX: 26900000 instructions simulated : ctaid=(14,5,0) tid=(1,9,0)
GPGPU-Sim PTX: 27000000 instructions simulated : ctaid=(9,8,0) tid=(5,7,0)
GPGPU-Sim PTX: 27100000 instructions simulated : ctaid=(3,1,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 43500  inst.: 27137944 (ipc=623.9) sim_rate=276917 (inst/sec) elapsed = 0:0:01:38 / Mon Jun 14 16:16:27 2021
GPGPU-Sim PTX: 27200000 instructions simulated : ctaid=(5,13,0) tid=(7,0,0)
GPGPU-Sim PTX: 27300000 instructions simulated : ctaid=(11,8,0) tid=(7,6,0)
GPGPU-Sim PTX: 27400000 instructions simulated : ctaid=(8,1,0) tid=(9,3,0)
GPGPU-Sim PTX: 27500000 instructions simulated : ctaid=(10,4,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 44000  inst.: 27461176 (ipc=624.1) sim_rate=277385 (inst/sec) elapsed = 0:0:01:39 / Mon Jun 14 16:16:28 2021
GPGPU-Sim PTX: 27600000 instructions simulated : ctaid=(3,4,0) tid=(5,5,0)
GPGPU-Sim PTX: 27700000 instructions simulated : ctaid=(6,2,0) tid=(3,2,0)
GPGPU-Sim PTX: 27800000 instructions simulated : ctaid=(0,8,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 44500  inst.: 27822080 (ipc=625.2) sim_rate=278220 (inst/sec) elapsed = 0:0:01:40 / Mon Jun 14 16:16:29 2021
GPGPU-Sim PTX: 27900000 instructions simulated : ctaid=(6,10,0) tid=(5,7,0)
GPGPU-Sim PTX: 28000000 instructions simulated : ctaid=(9,6,0) tid=(1,3,0)
GPGPU-Sim PTX: 28100000 instructions simulated : ctaid=(10,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 45000  inst.: 28146700 (ipc=625.5) sim_rate=278680 (inst/sec) elapsed = 0:0:01:41 / Mon Jun 14 16:16:30 2021
GPGPU-Sim PTX: 28200000 instructions simulated : ctaid=(9,2,0) tid=(1,1,0)
GPGPU-Sim PTX: 28300000 instructions simulated : ctaid=(10,6,0) tid=(5,5,0)
GPGPU-Sim PTX: 28400000 instructions simulated : ctaid=(2,0,0) tid=(1,5,0)
GPGPU-Sim PTX: 28500000 instructions simulated : ctaid=(4,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 45500  inst.: 28480056 (ipc=625.9) sim_rate=279216 (inst/sec) elapsed = 0:0:01:42 / Mon Jun 14 16:16:31 2021
GPGPU-Sim PTX: 28600000 instructions simulated : ctaid=(8,11,0) tid=(7,4,0)
GPGPU-Sim PTX: 28700000 instructions simulated : ctaid=(13,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 28800000 instructions simulated : ctaid=(8,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 46000  inst.: 28803560 (ipc=626.2) sim_rate=279646 (inst/sec) elapsed = 0:0:01:43 / Mon Jun 14 16:16:32 2021
GPGPU-Sim PTX: 28900000 instructions simulated : ctaid=(14,4,0) tid=(1,9,0)
GPGPU-Sim PTX: 29000000 instructions simulated : ctaid=(7,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 29100000 instructions simulated : ctaid=(11,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 46500  inst.: 29129700 (ipc=626.4) sim_rate=280093 (inst/sec) elapsed = 0:0:01:44 / Mon Jun 14 16:16:33 2021
GPGPU-Sim PTX: 29200000 instructions simulated : ctaid=(2,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 29300000 instructions simulated : ctaid=(14,6,0) tid=(3,8,0)
GPGPU-Sim PTX: 29400000 instructions simulated : ctaid=(12,14,0) tid=(5,1,0)
GPGPU-Sim PTX: 29500000 instructions simulated : ctaid=(8,14,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 47000  inst.: 29456612 (ipc=626.7) sim_rate=280539 (inst/sec) elapsed = 0:0:01:45 / Mon Jun 14 16:16:34 2021
GPGPU-Sim PTX: 29600000 instructions simulated : ctaid=(12,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 29700000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 29800000 instructions simulated : ctaid=(7,13,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 47500  inst.: 29786252 (ipc=627.1) sim_rate=278376 (inst/sec) elapsed = 0:0:01:47 / Mon Jun 14 16:16:36 2021
GPGPU-Sim PTX: 29900000 instructions simulated : ctaid=(5,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 30000000 instructions simulated : ctaid=(12,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 30100000 instructions simulated : ctaid=(7,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 48000  inst.: 30128620 (ipc=627.7) sim_rate=278968 (inst/sec) elapsed = 0:0:01:48 / Mon Jun 14 16:16:37 2021
GPGPU-Sim PTX: 30200000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 30300000 instructions simulated : ctaid=(5,5,0) tid=(5,1,0)
GPGPU-Sim PTX: 30400000 instructions simulated : ctaid=(8,2,0) tid=(5,5,0)
GPGPU-Sim PTX: 30500000 instructions simulated : ctaid=(5,7,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 48500  inst.: 30458644 (ipc=628.0) sim_rate=279437 (inst/sec) elapsed = 0:0:01:49 / Mon Jun 14 16:16:38 2021
GPGPU-Sim PTX: 30600000 instructions simulated : ctaid=(8,9,0) tid=(9,3,0)
GPGPU-Sim PTX: 30700000 instructions simulated : ctaid=(11,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 30800000 instructions simulated : ctaid=(14,14,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 49000  inst.: 30791028 (ipc=628.4) sim_rate=279918 (inst/sec) elapsed = 0:0:01:50 / Mon Jun 14 16:16:39 2021
GPGPU-Sim PTX: 30900000 instructions simulated : ctaid=(1,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 31000000 instructions simulated : ctaid=(5,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 31100000 instructions simulated : ctaid=(14,8,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 49500  inst.: 31060536 (ipc=627.5) sim_rate=279824 (inst/sec) elapsed = 0:0:01:51 / Mon Jun 14 16:16:40 2021
GPGPU-Sim uArch: cycles simulated: 50000  inst.: 31139460 (ipc=622.8) sim_rate=278030 (inst/sec) elapsed = 0:0:01:52 / Mon Jun 14 16:16:41 2021
GPGPU-Sim PTX: 31200000 instructions simulated : ctaid=(11,12,0) tid=(1,5,0)
GPGPU-Sim uArch: Shader 52 finished CTA #0 (50169,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #0 (50210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 52 finished CTA #1 (50232,0), 0 CTAs running
GPGPU-Sim uArch: Shader 52 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 28 finished CTA #0 (50379,0), 1 CTAs running
GPGPU-Sim uArch: Shader 42 finished CTA #0 (50398,0), 1 CTAs running
GPGPU-Sim uArch: Shader 57 finished CTA #0 (50496,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 50500  inst.: 31175264 (ipc=617.3) sim_rate=275887 (inst/sec) elapsed = 0:0:01:53 / Mon Jun 14 16:16:42 2021
GPGPU-Sim uArch: Shader 65 finished CTA #0 (50597,0), 1 CTAs running
GPGPU-Sim uArch: Shader 104 finished CTA #0 (50638,0), 1 CTAs running
GPGPU-Sim uArch: Shader 30 finished CTA #1 (50665,0), 1 CTAs running
GPGPU-Sim uArch: Shader 112 finished CTA #0 (50754,0), 1 CTAs running
GPGPU-Sim uArch: Shader 28 finished CTA #1 (50810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 28 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 57 finished CTA #1 (50829,0), 0 CTAs running
GPGPU-Sim uArch: Shader 57 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 75 finished CTA #0 (50837,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #0 (50839,0), 1 CTAs running
GPGPU-Sim uArch: Shader 60 finished CTA #0 (50843,0), 1 CTAs running
GPGPU-Sim uArch: Shader 47 finished CTA #0 (50845,0), 0 CTAs running
GPGPU-Sim uArch: Shader 47 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #0 (50861,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #1 (50862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 9 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #1 (50894,0), 0 CTAs running
GPGPU-Sim uArch: Shader 88 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 112 finished CTA #1 (50899,0), 0 CTAs running
GPGPU-Sim uArch: Shader 112 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 87 finished CTA #0 (50903,0), 0 CTAs running
GPGPU-Sim uArch: Shader 87 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 60 finished CTA #1 (50905,0), 0 CTAs running
GPGPU-Sim uArch: Shader 60 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 55 finished CTA #0 (50915,0), 0 CTAs running
GPGPU-Sim uArch: Shader 55 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 30 finished CTA #0 (50933,0), 0 CTAs running
GPGPU-Sim uArch: Shader 30 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 32 finished CTA #0 (50948,0), 1 CTAs running
GPGPU-Sim uArch: Shader 39 finished CTA #0 (50953,0), 0 CTAs running
GPGPU-Sim uArch: Shader 39 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 73 finished CTA #0 (50959,0), 1 CTAs running
GPGPU-Sim uArch: Shader 114 finished CTA #0 (50971,0), 1 CTAs running
GPGPU-Sim uArch: Shader 80 finished CTA #0 (50985,0), 1 CTAs running
GPGPU-Sim uArch: Shader 0 finished CTA #0 (50990,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 51000  inst.: 31177832 (ipc=611.3) sim_rate=273489 (inst/sec) elapsed = 0:0:01:54 / Mon Jun 14 16:16:43 2021
GPGPU-Sim uArch: Shader 29 finished CTA #1 (51027,0), 1 CTAs running
GPGPU-Sim uArch: Shader 23 finished CTA #0 (51028,0), 0 CTAs running
GPGPU-Sim uArch: Shader 23 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #0 (51028,0), 1 CTAs running
GPGPU-Sim uArch: Shader 63 finished CTA #0 (51035,0), 0 CTAs running
GPGPU-Sim uArch: Shader 63 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 29 finished CTA #0 (51056,0), 0 CTAs running
GPGPU-Sim uArch: Shader 29 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #0 (51059,0), 1 CTAs running
GPGPU-Sim uArch: Shader 32 finished CTA #1 (51065,0), 0 CTAs running
GPGPU-Sim uArch: Shader 32 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 17 finished CTA #0 (51066,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #0 (51082,0), 1 CTAs running
GPGPU-Sim uArch: Shader 73 finished CTA #1 (51111,0), 0 CTAs running
GPGPU-Sim uArch: Shader 73 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 89 finished CTA #1 (51114,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #0 (51119,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #0 (51138,0), 1 CTAs running
GPGPU-Sim uArch: Shader 94 finished CTA #0 (51144,0), 1 CTAs running
GPGPU-Sim uArch: Shader 17 finished CTA #1 (51149,0), 0 CTAs running
GPGPU-Sim uArch: Shader 17 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #0 (51157,0), 1 CTAs running
GPGPU-Sim uArch: Shader 109 finished CTA #0 (51193,0), 1 CTAs running
GPGPU-Sim uArch: Shader 75 finished CTA #1 (51194,0), 0 CTAs running
GPGPU-Sim uArch: Shader 75 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 90 finished CTA #0 (51200,0), 1 CTAs running
GPGPU-Sim uArch: Shader 15 finished CTA #0 (51206,0), 0 CTAs running
GPGPU-Sim uArch: Shader 15 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 26 finished CTA #0 (51206,0), 1 CTAs running
GPGPU-Sim uArch: Shader 72 finished CTA #0 (51210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 89 finished CTA #0 (51216,0), 0 CTAs running
GPGPU-Sim uArch: Shader 89 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 41 finished CTA #0 (51219,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #0 (51227,0), 1 CTAs running
GPGPU-Sim uArch: Shader 53 finished CTA #1 (51230,0), 1 CTAs running
GPGPU-Sim uArch: Shader 37 finished CTA #0 (51232,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #0 (51241,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #0 (51243,0), 1 CTAs running
GPGPU-Sim uArch: Shader 65 finished CTA #1 (51243,0), 0 CTAs running
GPGPU-Sim uArch: Shader 65 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 10 finished CTA #0 (51255,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #0 (51260,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #1 (51266,0), 0 CTAs running
GPGPU-Sim uArch: Shader 96 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 66 finished CTA #0 (51293,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #0 (51296,0), 1 CTAs running
GPGPU-Sim uArch: Shader 62 finished CTA #1 (51304,0), 1 CTAs running
GPGPU-Sim uArch: Shader 41 finished CTA #1 (51308,0), 0 CTAs running
GPGPU-Sim uArch: Shader 41 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 114 finished CTA #1 (51320,0), 0 CTAs running
GPGPU-Sim uArch: Shader 114 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 71 finished CTA #0 (51330,0), 0 CTAs running
GPGPU-Sim uArch: Shader 71 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 61 finished CTA #0 (51340,0), 1 CTAs running
GPGPU-Sim uArch: Shader 58 finished CTA #1 (51345,0), 1 CTAs running
GPGPU-Sim uArch: Shader 103 finished CTA #0 (51352,0), 0 CTAs running
GPGPU-Sim uArch: Shader 103 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 80 finished CTA #1 (51361,0), 0 CTAs running
GPGPU-Sim uArch: Shader 80 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 72 finished CTA #1 (51366,0), 0 CTAs running
GPGPU-Sim uArch: Shader 72 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 0 finished CTA #1 (51368,0), 0 CTAs running
GPGPU-Sim uArch: Shader 0 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #1 (51383,0), 0 CTAs running
GPGPU-Sim uArch: Shader 49 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #0 (51386,0), 1 CTAs running
GPGPU-Sim uArch: Shader 26 finished CTA #1 (51391,0), 0 CTAs running
GPGPU-Sim uArch: Shader 26 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 94 finished CTA #1 (51399,0), 0 CTAs running
GPGPU-Sim uArch: Shader 94 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #0 (51400,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #1 (51406,0), 0 CTAs running
GPGPU-Sim uArch: Shader 2 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 62 finished CTA #0 (51412,0), 0 CTAs running
GPGPU-Sim uArch: Shader 62 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #0 (51415,0), 1 CTAs running
GPGPU-Sim uArch: Shader 90 finished CTA #1 (51420,0), 0 CTAs running
GPGPU-Sim uArch: Shader 90 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #0 (51425,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #1 (51430,0), 0 CTAs running
GPGPU-Sim uArch: Shader 16 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #0 (51434,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #0 (51436,0), 1 CTAs running
GPGPU-Sim uArch: Shader 98 finished CTA #0 (51438,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #1 (51439,0), 0 CTAs running
GPGPU-Sim uArch: Shader 81 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 31 finished CTA #0 (51446,0), 0 CTAs running
GPGPU-Sim uArch: Shader 31 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #1 (51452,0), 0 CTAs running
GPGPU-Sim uArch: Shader 91 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 82 finished CTA #0 (51455,0), 1 CTAs running
GPGPU-Sim uArch: Shader 8 finished CTA #0 (51456,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #1 (51461,0), 0 CTAs running
GPGPU-Sim uArch: Shader 56 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 95 finished CTA #0 (51462,0), 0 CTAs running
GPGPU-Sim uArch: Shader 95 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 58 finished CTA #0 (51482,0), 0 CTAs running
GPGPU-Sim uArch: Shader 58 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 42 finished CTA #1 (51487,0), 0 CTAs running
GPGPU-Sim uArch: Shader 42 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #1 (51492,0), 0 CTAs running
GPGPU-Sim uArch: Shader 86 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #1 (51494,0), 0 CTAs running
GPGPU-Sim uArch: Shader 13 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 64 finished CTA #1 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 107 finished CTA #0 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #1 (51503,0), 1 CTAs running
GPGPU-Sim uArch: Shader 10 finished CTA #1 (51508,0), 0 CTAs running
GPGPU-Sim uArch: Shader 10 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #0 (51509,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #1 (51516,0), 1 CTAs running
GPGPU-Sim uArch: Shader 82 finished CTA #1 (51518,0), 0 CTAs running
GPGPU-Sim uArch: Shader 82 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #0 (51528,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #0 (51530,0), 0 CTAs running
GPGPU-Sim uArch: Shader 99 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 107 finished CTA #1 (51533,0), 0 CTAs running
GPGPU-Sim uArch: Shader 107 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #0 (51536,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #1 (51538,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #1 (51549,0), 0 CTAs running
GPGPU-Sim uArch: Shader 106 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 18 finished CTA #0 (51553,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #0 (51559,0), 1 CTAs running
GPGPU-Sim uArch: Shader 61 finished CTA #1 (51560,0), 0 CTAs running
GPGPU-Sim uArch: Shader 61 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 40 finished CTA #1 (51569,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #0 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 76 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #1 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 83 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #1 (51588,0), 0 CTAs running
GPGPU-Sim uArch: Shader 25 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #0 (51593,0), 1 CTAs running
GPGPU-Sim uArch: Shader 27 finished CTA #0 (51594,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #0 (51602,0), 0 CTAs running
GPGPU-Sim uArch: Shader 33 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #1 (51614,0), 1 CTAs running
GPGPU-Sim uArch: Shader 119 finished CTA #0 (51624,0), 0 CTAs running
GPGPU-Sim uArch: Shader 119 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #0 (51626,0), 1 CTAs running
GPGPU-Sim uArch: Shader 84 finished CTA #0 (51628,0), 1 CTAs running
GPGPU-Sim uArch: Shader 97 finished CTA #0 (51634,0), 1 CTAs running
GPGPU-Sim uArch: Shader 18 finished CTA #1 (51639,0), 0 CTAs running
GPGPU-Sim uArch: Shader 18 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 93 finished CTA #0 (51640,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #0 (51650,0), 1 CTAs running
GPGPU-Sim uArch: Shader 22 finished CTA #0 (51652,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #0 (51656,0), 1 CTAs running
GPGPU-Sim uArch: Shader 108 finished CTA #0 (51657,0), 1 CTAs running
GPGPU-Sim uArch: Shader 6 finished CTA #0 (51660,0), 1 CTAs running
GPGPU-Sim uArch: Shader 40 finished CTA #0 (51662,0), 0 CTAs running
GPGPU-Sim uArch: Shader 40 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 84 finished CTA #1 (51666,0), 0 CTAs running
GPGPU-Sim uArch: Shader 84 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #1 (51668,0), 0 CTAs running
GPGPU-Sim uArch: Shader 92 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #0 (51668,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #1 (51670,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #0 (51674,0), 0 CTAs running
GPGPU-Sim uArch: Shader 24 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #1 (51678,0), 0 CTAs running
GPGPU-Sim uArch: Shader 3 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #1 (51683,0), 0 CTAs running
GPGPU-Sim uArch: Shader 59 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 8 finished CTA #1 (51687,0), 0 CTAs running
GPGPU-Sim uArch: Shader 8 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #1 (51688,0), 0 CTAs running
GPGPU-Sim uArch: Shader 115 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #0 (51689,0), 1 CTAs running
GPGPU-Sim uArch: Shader 111 finished CTA #0 (51699,0), 0 CTAs running
GPGPU-Sim uArch: Shader 111 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #0 (51699,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #1 (51701,0), 0 CTAs running
GPGPU-Sim uArch: Shader 34 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 54 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 104 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 104 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #0 (51730,0), 0 CTAs running
GPGPU-Sim uArch: Shader 48 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #1 (51739,0), 0 CTAs running
GPGPU-Sim uArch: Shader 20 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #1 (51741,0), 0 CTAs running
GPGPU-Sim uArch: Shader 11 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 97 finished CTA #1 (51748,0), 0 CTAs running
GPGPU-Sim uArch: Shader 97 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 113 finished CTA #0 (51749,0), 1 CTAs running
GPGPU-Sim uArch: Shader 113 finished CTA #1 (51751,0), 0 CTAs running
GPGPU-Sim uArch: Shader 113 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 27 finished CTA #1 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 27 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 53 finished CTA #0 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 53 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #0 (51756,0), 1 CTAs running
GPGPU-Sim uArch: Shader 43 finished CTA #0 (51767,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #1 (51771,0), 0 CTAs running
GPGPU-Sim uArch: Shader 36 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #1 (51776,0), 0 CTAs running
GPGPU-Sim uArch: Shader 105 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #0 (51783,0), 1 CTAs running
GPGPU-Sim uArch: Shader 93 finished CTA #1 (51788,0), 0 CTAs running
GPGPU-Sim uArch: Shader 93 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #0 (51790,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #1 (51794,0), 0 CTAs running
GPGPU-Sim uArch: Shader 50 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #1 (51796,0), 0 CTAs running
GPGPU-Sim uArch: Shader 67 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 7 finished CTA #0 (51797,0), 0 CTAs running
GPGPU-Sim uArch: Shader 7 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 19 finished CTA #0 (51797,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #1 (51800,0), 0 CTAs running
GPGPU-Sim uArch: Shader 74 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 43 finished CTA #1 (51801,0), 0 CTAs running
GPGPU-Sim uArch: Shader 43 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #1 (51802,0), 0 CTAs running
GPGPU-Sim uArch: Shader 102 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 79 finished CTA #0 (51810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 79 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 98 finished CTA #1 (51811,0), 0 CTAs running
GPGPU-Sim uArch: Shader 98 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #1 (51813,0), 0 CTAs running
GPGPU-Sim uArch: Shader 85 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 51 finished CTA #0 (51824,0), 1 CTAs running
GPGPU-Sim uArch: Shader 66 finished CTA #1 (51833,0), 0 CTAs running
GPGPU-Sim uArch: Shader 66 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #0 (51840,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #1 (51840,0), 0 CTAs running
GPGPU-Sim uArch: Shader 35 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 44 finished CTA #0 (51841,0), 1 CTAs running
GPGPU-Sim uArch: Shader 101 finished CTA #1 (51842,0), 1 CTAs running
GPGPU-Sim uArch: Shader 64 finished CTA #0 (51847,0), 0 CTAs running
GPGPU-Sim uArch: Shader 64 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 37 finished CTA #1 (51851,0), 0 CTAs running
GPGPU-Sim uArch: Shader 37 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 77 finished CTA #0 (51851,0), 1 CTAs running
GPGPU-Sim uArch: Shader 51 finished CTA #1 (51855,0), 0 CTAs running
GPGPU-Sim uArch: Shader 51 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 109 finished CTA #1 (51857,0), 0 CTAs running
GPGPU-Sim uArch: Shader 109 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #1 (51860,0), 0 CTAs running
GPGPU-Sim uArch: Shader 12 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 6 finished CTA #1 (51862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 6 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #1 (51867,0), 0 CTAs running
GPGPU-Sim uArch: Shader 116 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 101 finished CTA #0 (51870,0), 0 CTAs running
GPGPU-Sim uArch: Shader 101 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #1 (51871,0), 1 CTAs running
GPGPU-Sim uArch: Shader 38 finished CTA #0 (51881,0), 1 CTAs running
GPGPU-Sim uArch: Shader 19 finished CTA #1 (51882,0), 0 CTAs running
GPGPU-Sim uArch: Shader 19 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 38 finished CTA #1 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 38 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #0 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 117 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #0 (51902,0), 1 CTAs running
GPGPU-Sim uArch: Shader 44 finished CTA #1 (51906,0), 0 CTAs running
GPGPU-Sim uArch: Shader 44 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #0 (51907,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #1 (51908,0), 0 CTAs running
GPGPU-Sim uArch: Shader 21 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #1 (51913,0), 0 CTAs running
GPGPU-Sim uArch: Shader 14 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #1 (51923,0), 0 CTAs running
GPGPU-Sim uArch: Shader 1 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 22 finished CTA #1 (51928,0), 0 CTAs running
GPGPU-Sim uArch: Shader 22 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #0 (51933,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 110 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #1 (51944,0), 0 CTAs running
GPGPU-Sim uArch: Shader 78 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 108 finished CTA #1 (51951,0), 0 CTAs running
GPGPU-Sim uArch: Shader 108 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #1 (51964,0), 0 CTAs running
GPGPU-Sim uArch: Shader 45 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #0 (51966,0), 1 CTAs running
GPGPU-Sim uArch: Shader 77 finished CTA #1 (51966,0), 0 CTAs running
GPGPU-Sim uArch: Shader 77 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #1 (51969,0), 0 CTAs running
GPGPU-Sim uArch: Shader 68 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #0 (51986,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 52000  inst.: 31185000 (ipc=599.7) sim_rate=271173 (inst/sec) elapsed = 0:0:01:55 / Mon Jun 14 16:16:44 2021
GPGPU-Sim uArch: Shader 70 finished CTA #0 (52000,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #0 (52004,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #1 (52012,0), 0 CTAs running
GPGPU-Sim uArch: Shader 4 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #1 (52016,0), 0 CTAs running
GPGPU-Sim uArch: Shader 100 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 110 finished CTA #1 (52020,0), 0 CTAs running
GPGPU-Sim uArch: Shader 110 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #1 (52024,0), 0 CTAs running
GPGPU-Sim uArch: Shader 118 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 5 finished CTA #0 (52044,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #1 (52044,0), 0 CTAs running
GPGPU-Sim uArch: Shader 46 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #0 (52048,0), 1 CTAs running
GPGPU-Sim uArch: Shader 5 finished CTA #1 (52061,0), 0 CTAs running
GPGPU-Sim uArch: Shader 5 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #1 (52107,0), 0 CTAs running
GPGPU-Sim uArch: Shader 69 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 70 finished CTA #1 (52113,0), 0 CTAs running
GPGPU-Sim uArch: Shader 70 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: GPU detected kernel '_Z14matrix_mul_gpuPiS_S_i' finished on shader 70.
kernel_name = _Z14matrix_mul_gpuPiS_S_i 
kernel_launch_uid = 1 
gpu_sim_cycle = 52114
gpu_sim_insn = 31185000
gpu_ipc =     598.3997
gpu_tot_sim_cycle = 52114
gpu_tot_sim_insn = 31185000
gpu_tot_ipc =     598.3997
gpu_tot_issued_cta = 0
gpu_stall_dramfull = 74963
gpu_stall_icnt2sh    = 150961
gpu_total_sim_rate=271173

========= Core cache stats =========
L1I_cache:
	L1I_total_cache_accesses = 696598
	L1I_total_cache_misses = 3598
	L1I_total_cache_miss_rate = 0.0052
	L1I_total_cache_pending_hits = 0
	L1I_total_cache_reservation_fails = 0
L1D_cache:
	L1D_cache_core[0]: Access = 40337, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9587, Reservation_fails = 17954
	L1D_cache_core[1]: Access = 40291, Miss = 2547, Miss_rate = 0.063, Pending_hits = 9564, Reservation_fails = 17242
	L1D_cache_core[2]: Access = 40297, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9568, Reservation_fails = 19637
	L1D_cache_core[3]: Access = 40231, Miss = 2551, Miss_rate = 0.063, Pending_hits = 9512, Reservation_fails = 20346
	L1D_cache_core[4]: Access = 40282, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9553, Reservation_fails = 20125
	L1D_cache_core[5]: Access = 40276, Miss = 2563, Miss_rate = 0.064, Pending_hits = 9550, Reservation_fails = 21507
	L1D_cache_core[6]: Access = 40282, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9549, Reservation_fails = 17583
	L1D_cache_core[7]: Access = 40322, Miss = 2555, Miss_rate = 0.063, Pending_hits = 9582, Reservation_fails = 19592
	L1D_cache_core[8]: Access = 40328, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9572, Reservation_fails = 18611
	L1D_cache_core[9]: Access = 40323, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9579, Reservation_fails = 17753
	L1D_cache_core[10]: Access = 40328, Miss = 2568, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 19928
	L1D_cache_core[11]: Access = 40322, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9591, Reservation_fails = 18029
	L1D_cache_core[12]: Access = 40328, Miss = 2580, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 17457
	L1D_cache_core[13]: Access = 40338, Miss = 2572, Miss_rate = 0.064, Pending_hits = 9599, Reservation_fails = 17935
	L1D_cache_core[14]: Access = 40343, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9605, Reservation_fails = 16746
	L1D_total_cache_accesses = 604628
	L1D_total_cache_misses = 38436
	L1D_total_cache_miss_rate = 0.0636
	L1D_total_cache_pending_hits = 143587
	L1D_total_cache_reservation_fails = 280445
	L1D_cache_data_port_util = 0.068
	L1D_cache_fill_port_util = 0.006
L1C_cache:
	L1C_total_cache_accesses = 3600
	L1C_total_cache_misses = 900
	L1C_total_cache_miss_rate = 0.2500
	L1C_total_cache_pending_hits = 0
	L1C_total_cache_reservation_fails = 0
L1T_cache:
	L1T_total_cache_accesses = 0
	L1T_total_cache_misses = 0
	L1T_total_cache_pending_hits = 0
	L1T_total_cache_reservation_fails = 0

Total_core_cache_stats:
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 422605
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 143587
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 34993
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 135803
	Total_core_cache_stats_breakdown[CONST_ACC_R][HIT] = 2700
	Total_core_cache_stats_breakdown[CONST_ACC_R][MISS] = 900
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 3443
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 144642
	Total_core_cache_stats_breakdown[INST_ACC_R][HIT] = 693000
	Total_core_cache_stats_breakdown[INST_ACC_R][MISS] = 3598
Shader 0 warp_id issue ditsribution:
warp_id:
0, 1, 2, 3, 4, 5, 6, 7, 
distro:
1388, 1388, 1388, 1388, 1388, 1388, 1388, 1388, 
gpgpu_n_tot_thrd_icount = 39974400
gpgpu_n_tot_w_icount = 1249200
gpgpu_n_stall_shd_mem = 614173
gpgpu_n_mem_read_local = 0
gpgpu_n_mem_write_local = 0
gpgpu_n_mem_read_global = 34993
gpgpu_n_mem_write_global = 3443
gpgpu_n_mem_texture = 0
gpgpu_n_mem_const = 120
gpgpu_n_load_insn  = 6750000
gpgpu_n_store_insn = 22500
gpgpu_n_shmem_insn = 0
gpgpu_n_tex_insn = 0
gpgpu_n_const_mem_insn = 0
gpgpu_n_param_mem_insn = 90000
gpgpu_n_shmem_bkconflict = 0
gpgpu_n_cache_bkconflict = 0
gpgpu_n_intrawarp_mshr_merge = 0
gpgpu_n_cmem_portconflict = 0
gpgpu_stall_shd_mem[c_mem][bk_conf] = 0
gpgpu_stall_shd_mem[c_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[c_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[c_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[t_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[t_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[t_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[s_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][coal_stall] = 614173
gpgpu_stall_shd_mem[gl_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[g_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[g_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpu_reg_bank_conflict_stalls = 0
Warp Occupancy Distribution:
Stall:441805	W0_Idle:1288598	W0_Scoreboard:9482477	W1:0	W2:0	W3:0	W4:312300	W5:0	W6:0	W7:0	W8:0	W9:0	W10:0	W11:0	W12:0	W13:0	W14:0	W15:0	W16:0	W17:0	W18:0	W19:0	W20:0	W21:0	W22:0	W23:0	W24:0	W25:0	W26:0	W27:0	W28:0	W29:0	W30:0	W31:0	W32:936900
traffic_breakdown_coretomem[CONST_ACC_R] = 960 {8:120,}
traffic_breakdown_coretomem[GLOBAL_ACC_R] = 279944 {8:34993,}
traffic_breakdown_coretomem[GLOBAL_ACC_W] = 220472 {40:1891,72:1035,136:517,}
traffic_breakdown_coretomem[INST_ACC_R] = 3840 {8:480,}
traffic_breakdown_memtocore[CONST_ACC_R] = 8640 {72:120,}
traffic_breakdown_memtocore[GLOBAL_ACC_R] = 4759048 {136:34993,}
traffic_breakdown_memtocore[GLOBAL_ACC_W] = 27544 {8:3443,}
traffic_breakdown_memtocore[INST_ACC_R] = 65280 {136:480,}
maxmrqlatency = 264 
maxdqlatency = 0 
maxmflatency = 2966 
averagemflatency = 331 
max_icnt2mem_latency = 3027 
max_icnt2sh_latency = 52113 
mrq_lat_table:1864 	85 	65 	133 	191 	266 	357 	250 	1 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
dq_lat_table:0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_table:0 	0 	0 	0 	0 	0 	0 	18641 	15624 	2440 	1703 	148 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2mem_lat_table:0 	0 	0 	25799 	1252 	2017 	3082 	2408 	1558 	1817 	996 	107 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2sh_lat_table:0 	0 	0 	5374 	26785 	2935 	19 	0 	0 	0 	0 	0 	0 	0 	0 	3443 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_pw_table:0 	0 	0 	0 	0 	0 	0 	80 	12 	6 	4 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
maximum concurrent accesses to same row:
dram[0]:         1         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         2         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
maximum service time to same row:
dram[0]:     48986     50145         0         0      2487      2476      2843      3294      2488      3444     15918     16579     40804     41559     49682     49938 
dram[1]:     47947     50759         0         0      2569      2004      2601      2463      2029      3357     15954     16900     40935     41738     49974     49682 
dram[2]:     50116     49853         0         0      2013      2401      2741      2529      1994      3279     16219     16887     41074     41907     49691     49690 
dram[3]:     49710     50845         0         0      2009      2478      2429      2621      2751      3897     16263     17171     41160     41972     49677     49878 
dram[4]:     50237     49850         0         0      2488      2003      3332      2538      2485      3278     16469     17200     41247     42166     49919     49687 
dram[5]:     49859     50068         0         0      2007      2513      2469      3322      3301      3935     16591     17199     41469     42218     49687     49888 
average row accesses per activate:
dram[0]:  4.250000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 64.000000 72.000000 83.000000 77.000000 
dram[1]:  7.000000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 77.000000 61.000000 82.000000 85.000000 
dram[2]: 14.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 63.000000 86.000000 81.000000 
dram[3]: 13.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 68.000000 75.000000 85.000000 
dram[4]: 15.000000 11.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 68.000000 65.000000 78.000000 88.000000 
dram[5]: 16.000000  8.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 65.000000 67.000000 81.000000 81.000000 
average row locality = 3212/88 = 36.500000
number of total memory accesses made:
dram[0]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
total accesses: 0
min_bank_accesses = 0!
min_chip_accesses = 0!
number of total read accesses:
dram[0]:         9         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[1]:         8         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[2]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[3]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[4]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[5]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
total reads: 2117
min_bank_accesses = 0!
chip skew: 355/352 = 1.01
number of total write accesses:
dram[0]:         8         8         0         0         0         0         0         0         0         0         0         0        32        40        51        45 
dram[1]:         6         8         0         0         0         0         0         0         0         0         0         0        45        29        50        53 
dram[2]:         8         5         0         0         0         0         0         0         0         0         0         0        31        31        54        49 
dram[3]:         7         5         0         0         0         0         0         0         0         0         0         0        31        36        43        53 
dram[4]:         9         7         0         0         0         0         0         0         0         0         0         0        36        33        46        56 
dram[5]:        10         4         0         0         0         0         0         0         0         0         0         0        33        35        49        49 
total reads: 1095
min_bank_accesses = 0!
chip skew: 191/175 = 1.09
average mf latency per bank:
dram[0]:       5972      1408    none      none        7624      6015      7869      7128      9772      7109      8243      8142      2583      2276      1126      1269
dram[1]:        832      1099    none      none        4678      6681      8215      6378      8926      6471      8058      8269      2353      2426      1412      1324
dram[2]:       1016       863    none      none        7296      5077      8207      5996      7922      6544      8246      8380      2536      2313      1209      1387
dram[3]:       1202       900    none      none        3901      6944      7219      6985      7134      7113      8836      8254      2245      2250      1322      1180
dram[4]:        954      1031    none      none        7742      4595      6331      6396      6875      6570      8289      8526      2078      2526      1299      1065
dram[5]:       1138      1430    none      none        4617      6835      5760      7269      6703      6460      7953      8525      2352      2337      1205      1207
maximum mf latency per bank:
dram[0]:       1445      1801         0         0      1719      1494      2630      2529      2515      1555       378       394      1663      1710      1771      1766
dram[1]:       1412      1863         0         0      1419      1625      2966      1991      2270      1260       385       394      1617      1737      1882      1992
dram[2]:       1748      1134         0         0      1929      1216      2610      2175      2008      1289       410       388      1659      1761      1974      1745
dram[3]:       1799      1558         0         0       615      1704      2483      1767      1313      1553       427       428      1645      1618      1735      1982
dram[4]:       1846      1376         0         0      1908       939      2307      2711      1561      1338       425       399      1707      1747      1971      1751
dram[5]:       1873      1435         0         0       784      1584      1609      2693      1359      1522       431       406      1760      1622      1768      1979

Number of Memory Banks Accessed per Memory Operation per Warp (from 0):
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Average # of Memory Banks Accessed per Memory Operation per Warp=-nan

position of mrq chosen
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	

average position of mrq chosen = -nan
Memory Partition 0: 
Cache L2_bank_000:
MSHR contents

Cache L2_bank_001:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[0]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67855 n_act=17 n_pre=3 n_req=539 n_rd=710 n_write=205 bw_util=0.0266
n_activity=5774 dram_eff=0.3169
bk0: 18a 68542i bk1: 12a 68532i bk2: 0a 68787i bk3: 0a 68789i bk4: 20a 68735i bk5: 20a 68735i bk6: 64a 68637i bk7: 64a 68637i bk8: 64a 68640i bk9: 64a 68645i bk10: 64a 68647i bk11: 64a 68647i bk12: 64a 67873i bk13: 64a 67623i bk14: 64a 67349i bk15: 64a 67377i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.30346
Memory Partition 1: 
Cache L2_bank_002:
MSHR contents

Cache L2_bank_003:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[1]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67854 n_act=15 n_pre=1 n_req=545 n_rd=708 n_write=212 bw_util=0.02675
n_activity=5886 dram_eff=0.3126
bk0: 16a 68532i bk1: 12a 68588i bk2: 0a 68788i bk3: 0a 68788i bk4: 20a 68735i bk5: 20a 68733i bk6: 64a 68638i bk7: 64a 68644i bk8: 64a 68647i bk9: 64a 68644i bk10: 64a 68645i bk11: 64a 68651i bk12: 64a 67904i bk13: 64a 67987i bk14: 64a 67318i bk15: 64a 67259i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.334918
Memory Partition 2: 
Cache L2_bank_004:
MSHR contents

Cache L2_bank_005:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[2]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67873 n_act=14 n_pre=0 n_req=530 n_rd=704 n_write=199 bw_util=0.02625
n_activity=5889 dram_eff=0.3067
bk0: 12a 68632i bk1: 8a 68675i bk2: 0a 68788i bk3: 0a 68789i bk4: 20a 68736i bk5: 24a 68729i bk6: 64a 68644i bk7: 64a 68646i bk8: 64a 68647i bk9: 64a 68651i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67950i bk13: 64a 67719i bk14: 64a 67369i bk15: 64a 67364i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.253743
Memory Partition 3: 
Cache L2_bank_006:
MSHR contents

Cache L2_bank_007:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[3]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67880 n_act=14 n_pre=0 n_req=527 n_rd=704 n_write=192 bw_util=0.02605
n_activity=5725 dram_eff=0.313
bk0: 12a 68629i bk1: 8a 68661i bk2: 0a 68786i bk3: 0a 68786i bk4: 20a 68738i bk5: 24a 68729i bk6: 64a 68642i bk7: 64a 68643i bk8: 64a 68643i bk9: 64a 68649i bk10: 64a 68638i bk11: 64a 68650i bk12: 64a 67791i bk13: 64a 67660i bk14: 64a 67449i bk15: 64a 67385i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.27991
Memory Partition 4: 
Cache L2_bank_008:
MSHR contents

Cache L2_bank_009:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[4]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67869 n_act=14 n_pre=0 n_req=539 n_rd=704 n_write=203 bw_util=0.02637
n_activity=5871 dram_eff=0.309
bk0: 12a 68578i bk1: 8a 68708i bk2: 0a 68790i bk3: 0a 68790i bk4: 20a 68735i bk5: 24a 68730i bk6: 64a 68649i bk7: 64a 68649i bk8: 64a 68645i bk9: 64a 68641i bk10: 64a 68643i bk11: 64a 68647i bk12: 64a 67633i bk13: 64a 67711i bk14: 64a 67323i bk15: 64a 67376i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.317125
Memory Partition 5: 
Cache L2_bank_010:
MSHR contents

Cache L2_bank_011:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[5]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67872 n_act=14 n_pre=0 n_req=532 n_rd=704 n_write=200 bw_util=0.02628
n_activity=5855 dram_eff=0.3088
bk0: 12a 68631i bk1: 8a 68658i bk2: 0a 68787i bk3: 0a 68787i bk4: 20a 68731i bk5: 24a 68730i bk6: 64a 68647i bk7: 64a 68643i bk8: 64a 68645i bk9: 64a 68645i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67817i bk13: 64a 67748i bk14: 64a 67487i bk15: 64a 67461i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.286524

========= L2 cache stats =========
L2_cache_bank[0]: Access = 3621, Miss = 179, Miss_rate = 0.049, Pending_hits = 321, Reservation_fails = 4452
L2_cache_bank[1]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4350
L2_cache_bank[2]: Access = 3450, Miss = 178, Miss_rate = 0.052, Pending_hits = 315, Reservation_fails = 4369
L2_cache_bank[3]: Access = 3206, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4198
L2_cache_bank[4]: Access = 3229, Miss = 176, Miss_rate = 0.055, Pending_hits = 297, Reservation_fails = 4050
L2_cache_bank[5]: Access = 3178, Miss = 176, Miss_rate = 0.055, Pending_hits = 292, Reservation_fails = 3913
L2_cache_bank[6]: Access = 3180, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4277
L2_cache_bank[7]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 306, Reservation_fails = 3642
L2_cache_bank[8]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 298, Reservation_fails = 3777
L2_cache_bank[9]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 308, Reservation_fails = 4112
L2_cache_bank[10]: Access = 3193, Miss = 176, Miss_rate = 0.055, Pending_hits = 290, Reservation_fails = 3935
L2_cache_bank[11]: Access = 3199, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4179
L2_total_cache_accesses = 39036
L2_total_cache_misses = 2117
L2_total_cache_miss_rate = 0.0542
L2_total_cache_pending_hits = 3621
L2_total_cache_reservation_fails = 49254
L2_total_cache_breakdown:
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 30369
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 3216
	L2_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 1408
	L2_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 48229
	L2_cache_stats_breakdown[CONST_ACC_R][HIT] = 116
	L2_cache_stats_breakdown[CONST_ACC_R][HIT_RESERVED] = 3
	L2_cache_stats_breakdown[CONST_ACC_R][MISS] = 1
	L2_cache_stats_breakdown[CONST_ACC_R][RESERVATION_FAIL] = 129
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT] = 2348
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT_RESERVED] = 391
	L2_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 704
	L2_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 552
	L2_cache_stats_breakdown[INST_ACC_R][HIT] = 465
	L2_cache_stats_breakdown[INST_ACC_R][HIT_RESERVED] = 11
	L2_cache_stats_breakdown[INST_ACC_R][MISS] = 4
	L2_cache_stats_breakdown[INST_ACC_R][RESERVATION_FAIL] = 344
L2_cache_data_port_util = 0.204
L2_cache_fill_port_util = 0.014

icnt_total_pkts_mem_to_simt=181168
icnt_total_pkts_simt_to_mem=45065
LD_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ST_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
----------------------------Interconnect-DETAILS--------------------------------
Class 0:
Packet latency average = 26.5755
	minimum = 6
	maximum = 856
Network latency average = 18.7063
	minimum = 6
	maximum = 797
Slowest packet = 1042
Flit latency average = 13.4921
	minimum = 6
	maximum = 797
Slowest flit = 2494
Fragmentation average = 0.0075187
	minimum = 0
	maximum = 332
Injected packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Accepted packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Injected flit rate average = 0.160782
	minimum = 0.0571056 (at node 1)
	maximum = 0.320317 (at node 15)
Accepted flit rate average= 0.160782
	minimum = 0.0703074 (at node 21)
	maximum = 0.233181 (at node 12)
Injected packet length average = 2.89775
Accepted packet length average = 2.89775
Total in-flight flits = 0 (0 measured)
====== Overall Traffic Statistics ======
====== Traffic class 0 ======
Packet latency average = 26.5755 (1 samples)
	minimum = 6 (1 samples)
	maximum = 856 (1 samples)
Network latency average = 18.7063 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Flit latency average = 13.4921 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Fragmentation average = 0.0075187 (1 samples)
	minimum = 0 (1 samples)
	maximum = 332 (1 samples)
Injected packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Accepted packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Injected flit rate average = 0.160782 (1 samples)
	minimum = 0.0571056 (1 samples)
	maximum = 0.320317 (1 samples)
Accepted flit rate average = 0.160782 (1 samples)
	minimum = 0.0703074 (1 samples)
	maximum = 0.233181 (1 samples)
Injected packet size average = 2.89775 (1 samples)
Accepted packet size average = 2.89775 (1 samples)
Hops average = 1 (1 samples)
----------------------------END-of-Interconnect-DETAILS-------------------------


gpgpu_simulation_time = 0 days, 0 hrs, 1 min, 55 sec (115 sec)
gpgpu_simulation_rate = 271173 (inst/sec)
gpgpu_simulation_rate = 453 (cycle/sec)
total time is 115176 ms


        *** GPGPU-Sim Simulator Version 3.2.2  [build 0] ***


GPGPU-Sim PTX: simulation mode 0 (can change with PTX_SIM_MODE_FUNC environment variable:
               1=functional simulation only, 0=detailed performance simulator)
GPGPU-Sim: Configuration options:

-network_mode                           1 # Interconnection network mode
-inter_config_file   config_fermi_islip.icnt # Interconnection network config file
-gpgpu_ptx_use_cuobjdump                    1 # Use cuobjdump to extract ptx and sass from binaries
-gpgpu_experimental_lib_support                    0 # Try to extract code from cuda libraries [Broken because of unknown cudaGetExportTable]
-gpgpu_ptx_convert_to_ptxplus                    0 # Convert SASS (native ISA) to ptxplus and run ptxplus
-gpgpu_ptx_force_max_capability                   20 # Force maximum compute capability
-gpgpu_ptx_inst_debug_to_file                    0 # Dump executed instructions' debug information to file
-gpgpu_ptx_inst_debug_file       inst_debug.txt # Executed instructions' debug output file
-gpgpu_ptx_inst_debug_thread_uid                    1 # Thread UID for executed instructions' debug output
-gpgpu_simd_model                       1 # 1 = post-dominator
-gpgpu_shader_core_pipeline              1536:32 # shader core pipeline config, i.e., {<nthread>:<warpsize>}
-gpgpu_tex_cache:l1  4:128:24,L:R:m:N:L,F:128:4,128:2 # per-shader L1 texture cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>:<rf>}
-gpgpu_const_cache:l1 64:64:2,L:R:f:N:L,A:2:32,4 # per-shader L1 constant memory cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:il1     4:128:4,L:R:f:N:L,A:2:32,4 # shader L1 instruction cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:dl1     32:128:4,L:L:m:N:H,A:32:8,8 # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PrefL1                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PreShared                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gmem_skip_L1D                          0 # global memory access skip L1D cache (implements -Xptxas -dlcm=cg, default=no skip)
-gpgpu_perfect_mem                      0 # enable perfect memory mode (no cache miss)
-n_regfile_gating_group                    4 # group of lanes that should be read/written together)
-gpgpu_clock_gated_reg_file                    0 # enable clock gated reg file for power calculations
-gpgpu_clock_gated_lanes                    0 # enable clock gated lanes for power calculations
-gpgpu_shader_registers                32768 # Number of registers per shader core. Limits number of concurrent CTAs. (default 8192)
-gpgpu_shader_cta                       8 # Maximum number of concurrent CTAs in shader (default 8)
-gpgpu_num_cta_barriers                   16 # Maximum number of named barriers per CTA (default 16)
-gpgpu_n_clusters                      15 # number of processing clusters
-gpgpu_n_cores_per_cluster                    8 # number of simd cores per cluster
-gpgpu_n_cluster_ejection_buffer_size                    8 # number of packets in ejection buffer
-gpgpu_n_ldst_response_buffer_size                    2 # number of response packets in ld/st unit ejection buffer
-gpgpu_shmem_size                   16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size                   49152 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefL1                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefShared                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_num_banks                   32 # Number of banks in the shared memory in each shader core (default 16)
-gpgpu_shmem_limited_broadcast                    0 # Limit shared memory to do one broadcast per cycle (default on)
-gpgpu_shmem_warp_parts                    1 # Number of portions a warp is divided into for shared memory bank conflict check 
-gpgpu_warpdistro_shader                   -1 # Specify which shader core to collect the warp size distribution from
-gpgpu_warp_issue_shader                    0 # Specify which shader core to collect the warp issue distribution from
-gpgpu_local_mem_map                    1 # Mapping from local memory space address to simulated GPU physical address space (default = enabled)
-gpgpu_num_reg_banks                   16 # Number of register banks (default = 8)
-gpgpu_reg_bank_use_warp_id                    0 # Use warp ID in mapping registers to banks (default = off)
-gpgpu_operand_collector_num_units_sp                    6 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_sfu                    8 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_mem                    2 # number of collector units (default = 2)
-gpgpu_operand_collector_num_units_gen                    0 # number of collector units (default = 0)
-gpgpu_operand_collector_num_in_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_operand_collector_num_out_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_coalesce_arch                   13 # Coalescing arch (default = 13, anything else is off for now)
-gpgpu_num_sched_per_core                    2 # Number of warp schedulers per core
-gpgpu_max_insn_issue_per_warp                    1 # Max number of instructions that can be issued per warp in one cycle by scheduler
-gpgpu_simt_core_sim_order                    1 # Select the simulation order of cores in a cluster (0=Fix, 1=Round-Robin)
-gpgpu_pipeline_widths        2,1,1,2,1,1,2 # Pipeline widths ID_OC_SP,ID_OC_SFU,ID_OC_MEM,OC_EX_SP,OC_EX_SFU,OC_EX_MEM,EX_WB
-gpgpu_num_sp_units                     2 # Number of SP units (default=1)
-gpgpu_num_sfu_units                    1 # Number of SF units (default=1)
-gpgpu_num_mem_units                    1 # Number if ldst units (default=1) WARNING: not hooked up to anything
-gpgpu_scheduler                      gto # Scheduler configuration: < lrr | gto | two_level_active > If two_level_active:<num_active_warps>:<inner_prioritization>:<outer_prioritization>For complete list of prioritization values see shader.h enum scheduler_prioritization_typeDefault: gto
-gpgpu_dram_scheduler                    1 # 0 = fifo, 1 = FR-FCFS (defaul)
-gpgpu_dram_partition_queues              8:8:8:8 # i2$:$2d:d2$:$2i
-l2_ideal                               0 # Use a ideal L2 cache that always hit
-gpgpu_cache:dl2     64:128:8,L:B:m:W:L,A:32:4,4:0,32 # unified banked L2 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>}
-gpgpu_cache:dl2_texture_only                    0 # L2 cache used for texture only
-gpgpu_n_mem                            6 # number of memory modules (e.g. memory controllers) in gpu
-gpgpu_n_sub_partition_per_mchannel                    2 # number of memory subpartition in each memory module
-gpgpu_n_mem_per_ctrlr                    2 # number of memory chips per memory controller
-gpgpu_memlatency_stat                   14 # track and display latency statistics 0x2 enables MC, 0x4 enables queue logs
-gpgpu_frfcfs_dram_sched_queue_size                   16 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_return_queue_size                  116 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_buswidth                    4 # default = 4 bytes (8 bytes per cycle at DDR)
-gpgpu_dram_burst_length                    8 # Burst length of each DRAM request (default = 4 data bus cycle)
-dram_data_command_freq_ratio                    4 # Frequency ratio between DRAM data bus and command bus (default = 2 times, i.e. DDR)
-gpgpu_dram_timing_opt nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40: CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2 # DRAM timing parameters = {nbk:tCCD:tRRD:tRCD:tRAS:tRP:tRC:CL:WL:tCDLR:tWR:nbkgrp:tCCDL:tRTPL}
-rop_latency                          120 # ROP queue latency (default 85)
-dram_latency                         100 # DRAM latency (default 30)
-gpgpu_mem_addr_mapping dramid@8;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS # mapping memory address to dram model {dramid@<start bit>;<memory address map>}
-gpgpu_mem_addr_test                    0 # run sweep test to check address mapping for aliased address
-gpgpu_mem_address_mask                    1 # 0 = old addressing mask, 1 = new addressing mask, 2 = new add. mask + flipped bank sel and chip sel bits
-gpuwattch_xml_file  gpuwattch_gtx480.xml # GPUWattch XML file
-power_simulation_enabled                    1 # Turn on power simulator (1=On, 0=Off)
-power_per_cycle_dump                    0 # Dump detailed power output each cycle
-power_trace_enabled                    0 # produce a file for the power trace (1=On, 0=Off)
-power_trace_zlevel                     6 # Compression level of the power trace output log (0=no comp, 9=highest)
-steady_power_levels_enabled                    0 # produce a file for the steady power levels (1=On, 0=Off)
-steady_state_definition                  8:4 # allowed deviation:number of samples
-gpgpu_max_cycle                        0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_insn                         0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_cta                          0 # terminates gpu simulation early (0 = no limit)
-gpgpu_runtime_stat                   500 # display runtime statistics such as dram utilization {<freq>:<flag>}
-liveness_message_freq                    1 # Minimum number of seconds between simulation liveness messages (0 = always print)
-gpgpu_flush_l1_cache                    0 # Flush L1 cache at the end of each kernel call
-gpgpu_flush_l2_cache                    0 # Flush L2 cache at the end of each kernel call
-gpgpu_deadlock_detect                    1 # Stop the simulation at deadlock (1=on (default), 0=off)
-gpgpu_ptx_instruction_classification                    0 # if enabled will classify ptx instruction types per kernel (Max 255 kernels now)
-gpgpu_ptx_sim_mode                     0 # Select between Performance (default) or Functional simulation (1)
-gpgpu_clock_domains 700.0:700.0:700.0:924.0 # Clock Domain Frequencies in MhZ {<Core Clock>:<ICNT Clock>:<L2 Clock>:<DRAM Clock>}
-gpgpu_max_concurrent_kernel                    8 # maximum kernels that can run concurrently on GPU
-gpgpu_cflog_interval                    0 # Interval between each snapshot in control flow logger
-visualizer_enabled                     0 # Turn on visualizer output (1=On, 0=Off)
-visualizer_outputfile                 NULL # Specifies the output log file for visualizer
-visualizer_zlevel                      6 # Compression level of the visualizer output log (0=no comp, 9=highest)
-trace_enabled                          0 # Turn on traces
-trace_components                    none # comma seperated list of traces to enable. Complete list found in trace_streams.tup. Default none
-trace_sampling_core                    0 # The core which is printed using CORE_DPRINTF. Default 0
-trace_sampling_memory_partition                   -1 # The memory partition which is printed using MEMPART_DPRINTF. Default -1 (i.e. all)
-enable_ptx_file_line_stats                    1 # Turn on PTX source line statistic profiling. (1 = On)
-ptx_line_stats_filename gpgpu_inst_stats.txt # Output file for PTX source line statistics.
-save_embedded_ptx                      0 # saves ptx files embedded in binary as <n>.ptx
-keep                                   0 # keep intermediate files created by GPGPU-Sim when interfacing with external programs
-gpgpu_ptx_save_converted_ptxplus                    0 # Saved converted ptxplus to a file
-ptx_opcode_latency_int         4,13,4,5,145 # Opcode latencies for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,19,25,145
-ptx_opcode_latency_fp          4,13,4,5,39 # Opcode latencies for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,30
-ptx_opcode_latency_dp         8,19,8,8,330 # Opcode latencies for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,335
-ptx_opcode_initiation_int            1,2,2,1,8 # Opcode initiation intervals for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,4,4,32
-ptx_opcode_initiation_fp            1,2,1,1,4 # Opcode initiation intervals for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,5
-ptx_opcode_initiation_dp         8,16,8,8,130 # Opcode initiation intervals for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,130
DRAM Timing Options:
nbk                                    16 # number of banks
CCD                                     2 # column to column delay
RRD                                     6 # minimal delay between activation of rows in different banks
RCD                                    12 # row to column delay
RAS                                    28 # time needed to activate row
RP                                     12 # time needed to precharge (deactivate) row
RC                                     40 # row cycle time
CDLR                                    5 # switching from write to read (changes tWTR)
WR                                     12 # last data-in to row precharge
CL                                     12 # CAS latency
WL                                      4 # Write latency
nbkgrp                                  4 # number of bank groups
CCDL                                    3 # column to column delay between accesses to different bank groups
RTPL                                    2 # read to precharge delay between accesses to different bank groups
Total number of memory sub partition = 12
addr_dec_mask[CHIP]  = 0000000000000000 	high:64 low:0
addr_dec_mask[BK]    = 000000000000e100 	high:16 low:8
addr_dec_mask[ROW]   = 000000000fff0000 	high:28 low:16
addr_dec_mask[COL]   = 0000000000001eff 	high:13 low:0
addr_dec_mask[BURST] = 000000000000003f 	high:6 low:0
sub_partition_id_mask = 0000000000000100
GPGPU-Sim uArch: clock freqs: 700000000.000000:700000000.000000:700000000.000000:924000000.000000
GPGPU-Sim uArch: clock periods: 0.00000000142857142857:0.00000000142857142857:0.00000000142857142857:0.00000000108225108225
*** Initializing Memory Statistics ***
GPGPU-Sim uArch: interconnect node map (shaderID+MemID to icntID)
GPGPU-Sim uArch: Memory nodes ID start from index: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
GPGPU-Sim uArch: interconnect node reverse map (icntID to shaderID+MemID)
GPGPU-Sim uArch: Memory nodes start from ID: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
8b51d2418a0658287a30fe3c4cc1fd21  /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
GPGPU-Sim uArch: performance model initialization complete.
GPGPU-Sim PTX: __cudaRegisterFatBinary, fat_cubin_handle = 1, filename=mm.cu
self exe links to: /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
Running md5sum using "md5sum /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM "
Running cuobjdump using "$CUDA_INSTALL_PATH/bin/cuobjdump -ptx -elf -sass /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM > _cuobjdump_complete_output_u2uivH"
Parsing file _cuobjdump_complete_output_u2uivH
######### cuobjdump parser ########
## Adding new section ELF
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_1.ptx
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section ELF
Adding arch: sm_20
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_2.ptx
Adding arch: sm_20
Adding identifier: mm.cu
Done parsing!!!
GPGPU-Sim PTX: __cudaRegisterFunction _Z14matrix_mul_gpuPiS_S_i : hostFun 0x0x400ce0, fat_cubin_handle = 1
GPGPU-Sim PTX: instruction assembly for function '_Z14matrix_mul_gpuPiS_S_i'...   done.
GPGPU-Sim PTX: finding reconvergence points for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: reconvergence points for _Z14matrix_mul_gpuPiS_S_i...
GPGPU-Sim PTX:  1 (potential) branch divergence @  PC=0x048 (_1.ptx:71) @%p1 bra $Lt_0_2306;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX:  2 (potential) branch divergence @  PC=0x130 (_1.ptx:103) @%p2 bra $Lt_0_1794;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:  3 (potential) branch divergence @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX: ... end of reconvergence points for _Z14matrix_mul_gpuPiS_S_i
GPGPU-Sim PTX: ... done pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'.
GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file _1.ptx
Adding _cuobjdump_2.ptx with cubin handle 1
GPGPU-Sim PTX: extracting embedded .ptx to temporary file "_ptx_f6py2W"
Running: cat _ptx_f6py2W | sed 's/.version 1.5/.version 1.4/' | sed 's/, texmode_independent//' | sed 's/\(\.extern \.const\[1\] .b8 \w\+\)\[\]/\1\[1\]/' | sed 's/const\[.\]/const\[0\]/g' > _ptx2_ur0Ozc
GPGPU-Sim PTX: generating ptxinfo using "$CUDA_INSTALL_PATH/bin/ptxas --gpu-name=sm_20 -v _ptx2_ur0Ozc --output-file  /dev/null 2> _ptx_f6py2Winfo"
GPGPU-Sim PTX: Kernel '_Z14matrix_mul_gpuPiS_S_i' : regs=14, lmem=0, smem=0, cmem=60
GPGPU-Sim PTX: removing ptxinfo using "rm -f _ptx_f6py2W _ptx2_ur0Ozc _ptx_f6py2Winfo"
GPGPU-Sim PTX: loading globals with explicit initializers... 
GPGPU-Sim PTX: finished loading globals (0 bytes total).
GPGPU-Sim PTX: loading constants with explicit initializers...  done.
Block(10,10)   Grid(15,15).

GPGPU-Sim PTX: cudaLaunch for 0x0x400ce0 (mode=performance simulation) on stream 0
GPGPU-Sim PTX: pushing kernel '_Z14matrix_mul_gpuPiS_S_i' to stream 0, gridDim= (15,15,1) blockDim = (10,10,1) 
kernel '_Z14matrix_mul_gpuPiS_S_i' transfer to GPU hardware scheduler
GPGPU-Sim uArch: Shader 8 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: CTA/core = 8, limited by: cta_limit
GPGPU-Sim uArch: core:  8, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 16 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 16, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 24 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 24, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 32 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 32, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 40 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 40, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 48 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 48, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 56 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 56, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 64 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 64, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 72 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 72, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 80 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 80, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 88 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 88, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 96 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 96, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 104 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:104, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 112 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:112, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 0 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  0, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 9 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  9, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 17 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 17, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 25 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 25, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 33 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 33, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 41 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 41, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 49 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 49, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 57 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 57, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 65 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 65, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 73 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 73, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 81 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 81, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 89 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 89, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 97 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 97, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 105 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:105, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 113 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:113, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 1 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  1, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 10 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 10, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 18 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 18, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 26 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 26, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 34 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 34, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 42 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 42, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 50 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 50, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 58 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 58, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 66 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 66, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 74 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 74, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 82 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 82, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 90 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 90, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 98 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 98, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 106 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:106, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 114 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:114, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 2 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  2, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 11 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 11, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 19 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 19, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 27 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 27, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 35 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 35, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 43 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 43, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 51 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 51, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 59 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 59, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 67 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 67, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 75 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 75, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 83 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 83, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 91 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 91, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 99 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 99, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 107 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:107, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 115 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:115, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 3 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  3, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 12 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 12, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 20 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 20, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 28 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 28, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 36 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 36, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 44 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 44, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 52 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 52, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 60 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 60, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 68 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 68, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 76 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 76, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 84 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 84, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 92 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 92, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 100 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:100, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 108 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:108, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 116 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:116, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 4 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  4, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 13 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 13, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 21 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 21, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 29 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 29, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 37 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 37, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 45 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 45, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 53 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 53, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 61 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 61, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 69 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 69, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 77 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 77, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 85 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 85, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 93 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 93, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 101 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:101, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 109 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:109, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 117 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:117, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 5 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  5, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 14 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 14, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 22 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 22, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 30 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 30, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 38 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 38, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 46 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 46, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 54 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 54, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 62 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 62, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 70 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 70, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 78 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 78, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 86 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 86, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 94 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 94, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 102 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:102, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 110 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:110, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 118 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:118, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 6 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  6, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 15 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 15, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 23 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 23, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 31 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 31, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 39 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 39, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 47 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 47, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 55 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 55, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 63 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 63, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 71 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 71, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 79 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 79, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 87 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 87, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 95 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 95, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 103 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:103, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 111 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:111, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 119 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:119, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 7 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  7, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: core:  8, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 16, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 24, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 32, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 40, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 48, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 56, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 64, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 72, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 80, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 88, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 96, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:104, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:112, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  0, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  9, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 17, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 25, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 33, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 41, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 49, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 57, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 65, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 73, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 81, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 89, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 97, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:105, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:113, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:  1, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 10, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 18, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 26, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 34, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 42, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 50, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 58, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 66, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 74, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 82, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 90, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 98, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:106, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:114, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:  2, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 11, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 19, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 27, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 35, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 43, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 51, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 59, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 67, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 75, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 83, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 91, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 99, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:107, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:115, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:  3, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 12, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 20, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 28, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 36, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 44, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 52, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 60, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 68, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 76, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 84, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 92, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:100, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:108, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:116, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:  4, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 13, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 21, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 29, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 37, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 45, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 53, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 61, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 69, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 77, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 85, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 93, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:101, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:109, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:117, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:  5, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 14, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 22, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 30, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 38, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 46, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 54, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 62, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 70, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 78, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 86, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 94, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:102, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:110, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:118, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:  6, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: cycles simulated: 500  inst.: 49456 (ipc=98.9) sim_rate=49456 (inst/sec) elapsed = 0:0:00:01 / Mon Jun 14 16:16:48 2021
GPGPU-Sim PTX: 100000 instructions simulated : ctaid=(2,12,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 1000  inst.: 155464 (ipc=155.5) sim_rate=77732 (inst/sec) elapsed = 0:0:00:02 / Mon Jun 14 16:16:49 2021
GPGPU-Sim PTX: 200000 instructions simulated : ctaid=(1,9,0) tid=(7,2,0)
GPGPU-Sim PTX: 300000 instructions simulated : ctaid=(13,12,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 1500  inst.: 294800 (ipc=196.5) sim_rate=98266 (inst/sec) elapsed = 0:0:00:03 / Mon Jun 14 16:16:50 2021
GPGPU-Sim PTX: 400000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 500000 instructions simulated : ctaid=(3,11,0) tid=(7,4,0)
GPGPU-Sim uArch: cycles simulated: 2000  inst.: 460980 (ipc=230.5) sim_rate=115245 (inst/sec) elapsed = 0:0:00:04 / Mon Jun 14 16:16:51 2021
GPGPU-Sim PTX: 600000 instructions simulated : ctaid=(0,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 700000 instructions simulated : ctaid=(3,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 2500  inst.: 658596 (ipc=263.4) sim_rate=131719 (inst/sec) elapsed = 0:0:00:05 / Mon Jun 14 16:16:52 2021
GPGPU-Sim uArch: cycles simulated: 3000  inst.: 686456 (ipc=228.8) sim_rate=114409 (inst/sec) elapsed = 0:0:00:06 / Mon Jun 14 16:16:53 2021
GPGPU-Sim uArch: cycles simulated: 3500  inst.: 722996 (ipc=206.6) sim_rate=103285 (inst/sec) elapsed = 0:0:00:07 / Mon Jun 14 16:16:54 2021
GPGPU-Sim PTX: 800000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 4500  inst.: 852268 (ipc=189.4) sim_rate=106533 (inst/sec) elapsed = 0:0:00:08 / Mon Jun 14 16:16:55 2021
GPGPU-Sim PTX: 900000 instructions simulated : ctaid=(4,10,0) tid=(9,1,0)
GPGPU-Sim PTX: 1000000 instructions simulated : ctaid=(9,0,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 5000  inst.: 1010580 (ipc=202.1) sim_rate=112286 (inst/sec) elapsed = 0:0:00:09 / Mon Jun 14 16:16:56 2021
GPGPU-Sim PTX: 1100000 instructions simulated : ctaid=(14,2,0) tid=(3,0,0)
GPGPU-Sim PTX: 1200000 instructions simulated : ctaid=(13,8,0) tid=(1,7,0)
GPGPU-Sim PTX: 1300000 instructions simulated : ctaid=(1,12,0) tid=(5,1,0)
GPGPU-Sim PTX: 1400000 instructions simulated : ctaid=(3,6,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 5500  inst.: 1387024 (ipc=252.2) sim_rate=126093 (inst/sec) elapsed = 0:0:00:11 / Mon Jun 14 16:16:58 2021
GPGPU-Sim PTX: 1500000 instructions simulated : ctaid=(10,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 1600000 instructions simulated : ctaid=(6,3,0) tid=(3,4,0)
GPGPU-Sim PTX: 1700000 instructions simulated : ctaid=(11,7,0) tid=(7,0,0)
GPGPU-Sim PTX: 1800000 instructions simulated : ctaid=(4,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 6000  inst.: 1834944 (ipc=305.8) sim_rate=152912 (inst/sec) elapsed = 0:0:00:12 / Mon Jun 14 16:16:59 2021
GPGPU-Sim PTX: 1900000 instructions simulated : ctaid=(10,0,0) tid=(5,3,0)
GPGPU-Sim PTX: 2000000 instructions simulated : ctaid=(12,3,0) tid=(9,5,0)
GPGPU-Sim PTX: 2100000 instructions simulated : ctaid=(3,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 2200000 instructions simulated : ctaid=(8,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 2300000 instructions simulated : ctaid=(12,7,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 6500  inst.: 2264888 (ipc=348.4) sim_rate=174222 (inst/sec) elapsed = 0:0:00:13 / Mon Jun 14 16:17:00 2021
GPGPU-Sim PTX: 2400000 instructions simulated : ctaid=(6,6,0) tid=(1,9,0)
GPGPU-Sim PTX: 2500000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 2600000 instructions simulated : ctaid=(4,12,0) tid=(3,4,0)
GPGPU-Sim PTX: 2700000 instructions simulated : ctaid=(12,1,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 7000  inst.: 2673128 (ipc=381.9) sim_rate=190937 (inst/sec) elapsed = 0:0:00:14 / Mon Jun 14 16:17:01 2021
GPGPU-Sim PTX: 2800000 instructions simulated : ctaid=(3,0,0) tid=(7,2,0)
GPGPU-Sim PTX: 2900000 instructions simulated : ctaid=(0,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 3000000 instructions simulated : ctaid=(5,4,0) tid=(1,3,0)
GPGPU-Sim uArch: cycles simulated: 7500  inst.: 3038248 (ipc=405.1) sim_rate=202549 (inst/sec) elapsed = 0:0:00:15 / Mon Jun 14 16:17:02 2021
GPGPU-Sim PTX: 3100000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3200000 instructions simulated : ctaid=(12,11,0) tid=(7,8,0)
GPGPU-Sim PTX: 3300000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3400000 instructions simulated : ctaid=(4,12,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 8000  inst.: 3416336 (ipc=427.0) sim_rate=200960 (inst/sec) elapsed = 0:0:00:17 / Mon Jun 14 16:17:04 2021
GPGPU-Sim PTX: 3500000 instructions simulated : ctaid=(11,13,0) tid=(1,9,0)
GPGPU-Sim PTX: 3600000 instructions simulated : ctaid=(10,12,0) tid=(9,1,0)
GPGPU-Sim PTX: 3700000 instructions simulated : ctaid=(4,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 3800000 instructions simulated : ctaid=(13,11,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 8500  inst.: 3794960 (ipc=446.5) sim_rate=210831 (inst/sec) elapsed = 0:0:00:18 / Mon Jun 14 16:17:05 2021
GPGPU-Sim PTX: 3900000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 4000000 instructions simulated : ctaid=(2,8,0) tid=(1,3,0)
GPGPU-Sim PTX: 4100000 instructions simulated : ctaid=(8,14,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 9000  inst.: 4145552 (ipc=460.6) sim_rate=218186 (inst/sec) elapsed = 0:0:00:19 / Mon Jun 14 16:17:06 2021
GPGPU-Sim PTX: 4200000 instructions simulated : ctaid=(13,11,0) tid=(3,8,0)
GPGPU-Sim PTX: 4300000 instructions simulated : ctaid=(9,1,0) tid=(3,2,0)
GPGPU-Sim PTX: 4400000 instructions simulated : ctaid=(7,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 4500000 instructions simulated : ctaid=(4,6,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 9500  inst.: 4494464 (ipc=473.1) sim_rate=224723 (inst/sec) elapsed = 0:0:00:20 / Mon Jun 14 16:17:07 2021
GPGPU-Sim PTX: 4600000 instructions simulated : ctaid=(10,0,0) tid=(9,9,0)
GPGPU-Sim PTX: 4700000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 4800000 instructions simulated : ctaid=(11,7,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 10000  inst.: 4846372 (ipc=484.6) sim_rate=230779 (inst/sec) elapsed = 0:0:00:21 / Mon Jun 14 16:17:08 2021
GPGPU-Sim PTX: 4900000 instructions simulated : ctaid=(3,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 5000000 instructions simulated : ctaid=(10,14,0) tid=(5,9,0)
GPGPU-Sim PTX: 5100000 instructions simulated : ctaid=(8,10,0) tid=(1,3,0)
GPGPU-Sim PTX: 5200000 instructions simulated : ctaid=(11,1,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 10500  inst.: 5179948 (ipc=493.3) sim_rate=225215 (inst/sec) elapsed = 0:0:00:23 / Mon Jun 14 16:17:10 2021
GPGPU-Sim PTX: 5300000 instructions simulated : ctaid=(7,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 5400000 instructions simulated : ctaid=(9,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 5500000 instructions simulated : ctaid=(1,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 11000  inst.: 5542924 (ipc=503.9) sim_rate=230955 (inst/sec) elapsed = 0:0:00:24 / Mon Jun 14 16:17:11 2021
GPGPU-Sim PTX: 5600000 instructions simulated : ctaid=(3,8,0) tid=(7,8,0)
GPGPU-Sim PTX: 5700000 instructions simulated : ctaid=(8,0,0) tid=(3,8,0)
GPGPU-Sim PTX: 5800000 instructions simulated : ctaid=(2,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 5900000 instructions simulated : ctaid=(4,12,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 11500  inst.: 5863464 (ipc=509.9) sim_rate=234538 (inst/sec) elapsed = 0:0:00:25 / Mon Jun 14 16:17:12 2021
GPGPU-Sim PTX: 6000000 instructions simulated : ctaid=(10,4,0) tid=(7,4,0)
GPGPU-Sim PTX: 6100000 instructions simulated : ctaid=(7,9,0) tid=(5,7,0)
GPGPU-Sim PTX: 6200000 instructions simulated : ctaid=(9,7,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 12000  inst.: 6198952 (ipc=516.6) sim_rate=238421 (inst/sec) elapsed = 0:0:00:26 / Mon Jun 14 16:17:13 2021
GPGPU-Sim PTX: 6300000 instructions simulated : ctaid=(0,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 6400000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 6500000 instructions simulated : ctaid=(6,10,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 12500  inst.: 6514828 (ipc=521.2) sim_rate=241289 (inst/sec) elapsed = 0:0:00:27 / Mon Jun 14 16:17:14 2021
GPGPU-Sim PTX: 6600000 instructions simulated : ctaid=(9,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 6700000 instructions simulated : ctaid=(7,2,0) tid=(5,9,0)
GPGPU-Sim PTX: 6800000 instructions simulated : ctaid=(1,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 6900000 instructions simulated : ctaid=(9,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 13000  inst.: 6850760 (ipc=527.0) sim_rate=244670 (inst/sec) elapsed = 0:0:00:28 / Mon Jun 14 16:17:15 2021
GPGPU-Sim PTX: 7000000 instructions simulated : ctaid=(5,1,0) tid=(1,7,0)
GPGPU-Sim PTX: 7100000 instructions simulated : ctaid=(13,0,0) tid=(5,1,0)
GPGPU-Sim PTX: 7200000 instructions simulated : ctaid=(10,11,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 13500  inst.: 7177796 (ipc=531.7) sim_rate=247510 (inst/sec) elapsed = 0:0:00:29 / Mon Jun 14 16:17:16 2021
GPGPU-Sim PTX: 7300000 instructions simulated : ctaid=(3,5,0) tid=(3,4,0)
GPGPU-Sim PTX: 7400000 instructions simulated : ctaid=(1,12,0) tid=(3,0,0)
GPGPU-Sim PTX: 7500000 instructions simulated : ctaid=(2,12,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 14000  inst.: 7513232 (ipc=536.7) sim_rate=242362 (inst/sec) elapsed = 0:0:00:31 / Mon Jun 14 16:17:18 2021
GPGPU-Sim PTX: 7600000 instructions simulated : ctaid=(12,4,0) tid=(3,8,0)
GPGPU-Sim PTX: 7700000 instructions simulated : ctaid=(5,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 7800000 instructions simulated : ctaid=(10,0,0) tid=(7,4,0)
GPGPU-Sim PTX: 7900000 instructions simulated : ctaid=(11,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 14500  inst.: 7861928 (ipc=542.2) sim_rate=245685 (inst/sec) elapsed = 0:0:00:32 / Mon Jun 14 16:17:19 2021
GPGPU-Sim PTX: 8000000 instructions simulated : ctaid=(13,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 8100000 instructions simulated : ctaid=(8,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 8200000 instructions simulated : ctaid=(9,13,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 15000  inst.: 8177372 (ipc=545.2) sim_rate=247799 (inst/sec) elapsed = 0:0:00:33 / Mon Jun 14 16:17:20 2021
GPGPU-Sim PTX: 8300000 instructions simulated : ctaid=(4,4,0) tid=(3,0,0)
GPGPU-Sim PTX: 8400000 instructions simulated : ctaid=(10,10,0) tid=(1,7,0)
GPGPU-Sim PTX: 8500000 instructions simulated : ctaid=(0,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 15500  inst.: 8534132 (ipc=550.6) sim_rate=251003 (inst/sec) elapsed = 0:0:00:34 / Mon Jun 14 16:17:21 2021
GPGPU-Sim PTX: 8600000 instructions simulated : ctaid=(12,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 8700000 instructions simulated : ctaid=(4,3,0) tid=(1,3,0)
GPGPU-Sim PTX: 8800000 instructions simulated : ctaid=(9,14,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 16000  inst.: 8846628 (ipc=552.9) sim_rate=252760 (inst/sec) elapsed = 0:0:00:35 / Mon Jun 14 16:17:22 2021
GPGPU-Sim PTX: 8900000 instructions simulated : ctaid=(0,1,0) tid=(7,4,0)
GPGPU-Sim PTX: 9000000 instructions simulated : ctaid=(10,11,0) tid=(1,7,0)
GPGPU-Sim PTX: 9100000 instructions simulated : ctaid=(12,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 9200000 instructions simulated : ctaid=(10,7,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 16500  inst.: 9193752 (ipc=557.2) sim_rate=255382 (inst/sec) elapsed = 0:0:00:36 / Mon Jun 14 16:17:23 2021
GPGPU-Sim PTX: 9300000 instructions simulated : ctaid=(10,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 9400000 instructions simulated : ctaid=(2,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 9500000 instructions simulated : ctaid=(3,9,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 17000  inst.: 9519480 (ipc=560.0) sim_rate=250512 (inst/sec) elapsed = 0:0:00:38 / Mon Jun 14 16:17:25 2021
GPGPU-Sim PTX: 9600000 instructions simulated : ctaid=(12,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 9700000 instructions simulated : ctaid=(1,3,0) tid=(7,0,0)
GPGPU-Sim PTX: 9800000 instructions simulated : ctaid=(7,0,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 17500  inst.: 9845216 (ipc=562.6) sim_rate=252441 (inst/sec) elapsed = 0:0:00:39 / Mon Jun 14 16:17:26 2021
GPGPU-Sim PTX: 9900000 instructions simulated : ctaid=(1,6,0) tid=(3,2,0)
GPGPU-Sim PTX: 10000000 instructions simulated : ctaid=(10,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 10100000 instructions simulated : ctaid=(10,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 10200000 instructions simulated : ctaid=(2,4,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 18000  inst.: 10175904 (ipc=565.3) sim_rate=254397 (inst/sec) elapsed = 0:0:00:40 / Mon Jun 14 16:17:27 2021
GPGPU-Sim PTX: 10300000 instructions simulated : ctaid=(10,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 10400000 instructions simulated : ctaid=(8,8,0) tid=(9,9,0)
GPGPU-Sim PTX: 10500000 instructions simulated : ctaid=(13,8,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 18500  inst.: 10526504 (ipc=569.0) sim_rate=256744 (inst/sec) elapsed = 0:0:00:41 / Mon Jun 14 16:17:28 2021
GPGPU-Sim PTX: 10600000 instructions simulated : ctaid=(13,12,0) tid=(5,7,0)
GPGPU-Sim PTX: 10700000 instructions simulated : ctaid=(14,2,0) tid=(7,4,0)
GPGPU-Sim PTX: 10800000 instructions simulated : ctaid=(9,1,0) tid=(3,6,0)
GPGPU-Sim PTX: 10900000 instructions simulated : ctaid=(14,6,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 19000  inst.: 10861260 (ipc=571.6) sim_rate=258601 (inst/sec) elapsed = 0:0:00:42 / Mon Jun 14 16:17:29 2021
GPGPU-Sim PTX: 11000000 instructions simulated : ctaid=(3,13,0) tid=(1,3,0)
GPGPU-Sim PTX: 11100000 instructions simulated : ctaid=(13,11,0) tid=(7,6,0)
GPGPU-Sim PTX: 11200000 instructions simulated : ctaid=(11,6,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 19500  inst.: 11182232 (ipc=573.4) sim_rate=260051 (inst/sec) elapsed = 0:0:00:43 / Mon Jun 14 16:17:30 2021
GPGPU-Sim PTX: 11300000 instructions simulated : ctaid=(7,7,0) tid=(5,9,0)
GPGPU-Sim PTX: 11400000 instructions simulated : ctaid=(9,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 11500000 instructions simulated : ctaid=(14,10,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 20000  inst.: 11510844 (ipc=575.5) sim_rate=261610 (inst/sec) elapsed = 0:0:00:44 / Mon Jun 14 16:17:31 2021
GPGPU-Sim PTX: 11600000 instructions simulated : ctaid=(2,13,0) tid=(3,0,0)
GPGPU-Sim PTX: 11700000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 11800000 instructions simulated : ctaid=(1,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 20500  inst.: 11845228 (ipc=577.8) sim_rate=257504 (inst/sec) elapsed = 0:0:00:46 / Mon Jun 14 16:17:33 2021
GPGPU-Sim PTX: 11900000 instructions simulated : ctaid=(12,9,0) tid=(5,1,0)
GPGPU-Sim PTX: 12000000 instructions simulated : ctaid=(9,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 12100000 instructions simulated : ctaid=(10,1,0) tid=(7,6,0)
GPGPU-Sim PTX: 12200000 instructions simulated : ctaid=(12,2,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 21000  inst.: 12183192 (ipc=580.2) sim_rate=259216 (inst/sec) elapsed = 0:0:00:47 / Mon Jun 14 16:17:34 2021
GPGPU-Sim PTX: 12300000 instructions simulated : ctaid=(7,7,0) tid=(7,6,0)
GPGPU-Sim PTX: 12400000 instructions simulated : ctaid=(9,1,0) tid=(3,0,0)
GPGPU-Sim PTX: 12500000 instructions simulated : ctaid=(14,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 21500  inst.: 12511912 (ipc=581.9) sim_rate=260664 (inst/sec) elapsed = 0:0:00:48 / Mon Jun 14 16:17:35 2021
GPGPU-Sim PTX: 12600000 instructions simulated : ctaid=(13,7,0) tid=(9,9,0)
GPGPU-Sim PTX: 12700000 instructions simulated : ctaid=(0,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 12800000 instructions simulated : ctaid=(5,9,0) tid=(1,9,0)
GPGPU-Sim PTX: 12900000 instructions simulated : ctaid=(5,13,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 22000  inst.: 12861064 (ipc=584.6) sim_rate=262470 (inst/sec) elapsed = 0:0:00:49 / Mon Jun 14 16:17:36 2021
GPGPU-Sim PTX: 13000000 instructions simulated : ctaid=(14,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 13100000 instructions simulated : ctaid=(1,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 13200000 instructions simulated : ctaid=(11,12,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 22500  inst.: 13200500 (ipc=586.7) sim_rate=264010 (inst/sec) elapsed = 0:0:00:50 / Mon Jun 14 16:17:37 2021
GPGPU-Sim PTX: 13300000 instructions simulated : ctaid=(11,1,0) tid=(7,0,0)
GPGPU-Sim PTX: 13400000 instructions simulated : ctaid=(11,4,0) tid=(5,9,0)
GPGPU-Sim PTX: 13500000 instructions simulated : ctaid=(3,13,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 23000  inst.: 13520992 (ipc=587.9) sim_rate=265117 (inst/sec) elapsed = 0:0:00:51 / Mon Jun 14 16:17:38 2021
GPGPU-Sim PTX: 13600000 instructions simulated : ctaid=(5,12,0) tid=(9,5,0)
GPGPU-Sim PTX: 13700000 instructions simulated : ctaid=(3,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 13800000 instructions simulated : ctaid=(1,2,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 23500  inst.: 13840084 (ipc=588.9) sim_rate=266155 (inst/sec) elapsed = 0:0:00:52 / Mon Jun 14 16:17:39 2021
GPGPU-Sim PTX: 13900000 instructions simulated : ctaid=(11,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 14000000 instructions simulated : ctaid=(7,6,0) tid=(5,1,0)
GPGPU-Sim PTX: 14100000 instructions simulated : ctaid=(9,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 14200000 instructions simulated : ctaid=(14,8,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 24000  inst.: 14188812 (ipc=591.2) sim_rate=262755 (inst/sec) elapsed = 0:0:00:54 / Mon Jun 14 16:17:41 2021
GPGPU-Sim PTX: 14300000 instructions simulated : ctaid=(10,11,0) tid=(7,0,0)
GPGPU-Sim PTX: 14400000 instructions simulated : ctaid=(13,13,0) tid=(3,2,0)
GPGPU-Sim PTX: 14500000 instructions simulated : ctaid=(3,10,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 24500  inst.: 14517376 (ipc=592.5) sim_rate=263952 (inst/sec) elapsed = 0:0:00:55 / Mon Jun 14 16:17:42 2021
GPGPU-Sim PTX: 14600000 instructions simulated : ctaid=(12,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 14700000 instructions simulated : ctaid=(7,7,0) tid=(5,7,0)
GPGPU-Sim PTX: 14800000 instructions simulated : ctaid=(13,8,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 25000  inst.: 14846428 (ipc=593.9) sim_rate=265114 (inst/sec) elapsed = 0:0:00:56 / Mon Jun 14 16:17:43 2021
GPGPU-Sim PTX: 14900000 instructions simulated : ctaid=(9,5,0) tid=(9,9,0)
GPGPU-Sim PTX: 15000000 instructions simulated : ctaid=(2,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 15100000 instructions simulated : ctaid=(8,13,0) tid=(9,5,0)
GPGPU-Sim PTX: 15200000 instructions simulated : ctaid=(4,12,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 25500  inst.: 15182476 (ipc=595.4) sim_rate=266359 (inst/sec) elapsed = 0:0:00:57 / Mon Jun 14 16:17:44 2021
GPGPU-Sim PTX: 15300000 instructions simulated : ctaid=(6,12,0) tid=(3,2,0)
GPGPU-Sim PTX: 15400000 instructions simulated : ctaid=(13,0,0) tid=(1,7,0)
GPGPU-Sim PTX: 15500000 instructions simulated : ctaid=(10,1,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 26000  inst.: 15506780 (ipc=596.4) sim_rate=267358 (inst/sec) elapsed = 0:0:00:58 / Mon Jun 14 16:17:45 2021
GPGPU-Sim PTX: 15600000 instructions simulated : ctaid=(2,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 15700000 instructions simulated : ctaid=(5,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 15800000 instructions simulated : ctaid=(2,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 15900000 instructions simulated : ctaid=(6,5,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 26500  inst.: 15853292 (ipc=598.2) sim_rate=268699 (inst/sec) elapsed = 0:0:00:59 / Mon Jun 14 16:17:46 2021
GPGPU-Sim PTX: 16000000 instructions simulated : ctaid=(13,2,0) tid=(7,2,0)
GPGPU-Sim PTX: 16100000 instructions simulated : ctaid=(14,0,0) tid=(1,9,0)
GPGPU-Sim PTX: 16200000 instructions simulated : ctaid=(12,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 27000  inst.: 16182996 (ipc=599.4) sim_rate=269716 (inst/sec) elapsed = 0:0:01:00 / Mon Jun 14 16:17:47 2021
GPGPU-Sim PTX: 16300000 instructions simulated : ctaid=(5,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 16400000 instructions simulated : ctaid=(13,10,0) tid=(7,0,0)
GPGPU-Sim PTX: 16500000 instructions simulated : ctaid=(14,13,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 27500  inst.: 16529016 (ipc=601.1) sim_rate=270967 (inst/sec) elapsed = 0:0:01:01 / Mon Jun 14 16:17:48 2021
GPGPU-Sim PTX: 16600000 instructions simulated : ctaid=(11,10,0) tid=(3,8,0)
GPGPU-Sim PTX: 16700000 instructions simulated : ctaid=(12,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 16800000 instructions simulated : ctaid=(1,12,0) tid=(7,8,0)
GPGPU-Sim PTX: 16900000 instructions simulated : ctaid=(9,11,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 28000  inst.: 16859168 (ipc=602.1) sim_rate=267605 (inst/sec) elapsed = 0:0:01:03 / Mon Jun 14 16:17:50 2021
GPGPU-Sim PTX: 17000000 instructions simulated : ctaid=(5,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 17100000 instructions simulated : ctaid=(7,11,0) tid=(1,9,0)
GPGPU-Sim PTX: 17200000 instructions simulated : ctaid=(6,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 28500  inst.: 17206480 (ipc=603.7) sim_rate=268851 (inst/sec) elapsed = 0:0:01:04 / Mon Jun 14 16:17:51 2021
GPGPU-Sim PTX: 17300000 instructions simulated : ctaid=(14,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 17400000 instructions simulated : ctaid=(1,11,0) tid=(9,1,0)
GPGPU-Sim PTX: 17500000 instructions simulated : ctaid=(9,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 29000  inst.: 17530396 (ipc=604.5) sim_rate=269698 (inst/sec) elapsed = 0:0:01:05 / Mon Jun 14 16:17:52 2021
GPGPU-Sim PTX: 17600000 instructions simulated : ctaid=(8,5,0) tid=(9,7,0)
GPGPU-Sim PTX: 17700000 instructions simulated : ctaid=(9,2,0) tid=(1,3,0)
GPGPU-Sim PTX: 17800000 instructions simulated : ctaid=(11,11,0) tid=(5,9,0)
GPGPU-Sim PTX: 17900000 instructions simulated : ctaid=(14,8,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 29500  inst.: 17869380 (ipc=605.7) sim_rate=270748 (inst/sec) elapsed = 0:0:01:06 / Mon Jun 14 16:17:53 2021
GPGPU-Sim PTX: 18000000 instructions simulated : ctaid=(11,8,0) tid=(1,1,0)
GPGPU-Sim PTX: 18100000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 18200000 instructions simulated : ctaid=(12,13,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 30000  inst.: 18174448 (ipc=605.8) sim_rate=271260 (inst/sec) elapsed = 0:0:01:07 / Mon Jun 14 16:17:54 2021
GPGPU-Sim PTX: 18300000 instructions simulated : ctaid=(4,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 18400000 instructions simulated : ctaid=(1,14,0) tid=(9,1,0)
GPGPU-Sim PTX: 18500000 instructions simulated : ctaid=(0,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 30500  inst.: 18511332 (ipc=606.9) sim_rate=272225 (inst/sec) elapsed = 0:0:01:08 / Mon Jun 14 16:17:55 2021
GPGPU-Sim PTX: 18600000 instructions simulated : ctaid=(11,5,0) tid=(7,6,0)
GPGPU-Sim PTX: 18700000 instructions simulated : ctaid=(5,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 18800000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 18900000 instructions simulated : ctaid=(1,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 31000  inst.: 18869068 (ipc=608.7) sim_rate=273464 (inst/sec) elapsed = 0:0:01:09 / Mon Jun 14 16:17:56 2021
GPGPU-Sim PTX: 19000000 instructions simulated : ctaid=(11,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 19100000 instructions simulated : ctaid=(1,4,0) tid=(7,8,0)
GPGPU-Sim PTX: 19200000 instructions simulated : ctaid=(4,9,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 31500  inst.: 19201680 (ipc=609.6) sim_rate=270446 (inst/sec) elapsed = 0:0:01:11 / Mon Jun 14 16:17:58 2021
GPGPU-Sim PTX: 19300000 instructions simulated : ctaid=(6,13,0) tid=(9,9,0)
GPGPU-Sim PTX: 19400000 instructions simulated : ctaid=(11,0,0) tid=(9,1,0)
GPGPU-Sim PTX: 19500000 instructions simulated : ctaid=(9,2,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 32000  inst.: 19534468 (ipc=610.5) sim_rate=271312 (inst/sec) elapsed = 0:0:01:12 / Mon Jun 14 16:17:59 2021
GPGPU-Sim PTX: 19600000 instructions simulated : ctaid=(8,2,0) tid=(1,5,0)
GPGPU-Sim PTX: 19700000 instructions simulated : ctaid=(11,14,0) tid=(7,6,0)
GPGPU-Sim PTX: 19800000 instructions simulated : ctaid=(12,2,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 32500  inst.: 19828596 (ipc=610.1) sim_rate=271624 (inst/sec) elapsed = 0:0:01:13 / Mon Jun 14 16:18:00 2021
GPGPU-Sim PTX: 19900000 instructions simulated : ctaid=(4,3,0) tid=(3,6,0)
GPGPU-Sim PTX: 20000000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 20100000 instructions simulated : ctaid=(1,1,0) tid=(1,3,0)
GPGPU-Sim PTX: 20200000 instructions simulated : ctaid=(4,2,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33000  inst.: 20172780 (ipc=611.3) sim_rate=272605 (inst/sec) elapsed = 0:0:01:14 / Mon Jun 14 16:18:01 2021
GPGPU-Sim PTX: 20300000 instructions simulated : ctaid=(9,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 20400000 instructions simulated : ctaid=(10,10,0) tid=(9,5,0)
GPGPU-Sim PTX: 20500000 instructions simulated : ctaid=(9,4,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33500  inst.: 20511528 (ipc=612.3) sim_rate=273487 (inst/sec) elapsed = 0:0:01:15 / Mon Jun 14 16:18:02 2021
GPGPU-Sim PTX: 20600000 instructions simulated : ctaid=(6,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 20700000 instructions simulated : ctaid=(12,12,0) tid=(7,2,0)
GPGPU-Sim PTX: 20800000 instructions simulated : ctaid=(8,4,0) tid=(9,7,0)
GPGPU-Sim PTX: 20900000 instructions simulated : ctaid=(8,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 34000  inst.: 20866696 (ipc=613.7) sim_rate=274561 (inst/sec) elapsed = 0:0:01:16 / Mon Jun 14 16:18:03 2021
GPGPU-Sim PTX: 21000000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 21100000 instructions simulated : ctaid=(7,2,0) tid=(9,1,0)
GPGPU-Sim PTX: 21200000 instructions simulated : ctaid=(1,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 34500  inst.: 21196192 (ipc=614.4) sim_rate=275275 (inst/sec) elapsed = 0:0:01:17 / Mon Jun 14 16:18:04 2021
GPGPU-Sim PTX: 21300000 instructions simulated : ctaid=(6,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 21400000 instructions simulated : ctaid=(4,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 21500000 instructions simulated : ctaid=(2,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 35000  inst.: 21528812 (ipc=615.1) sim_rate=276010 (inst/sec) elapsed = 0:0:01:18 / Mon Jun 14 16:18:05 2021
GPGPU-Sim PTX: 21600000 instructions simulated : ctaid=(12,14,0) tid=(1,3,0)
GPGPU-Sim PTX: 21700000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim PTX: 21800000 instructions simulated : ctaid=(5,14,0) tid=(3,0,0)
GPGPU-Sim PTX: 21900000 instructions simulated : ctaid=(12,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 35500  inst.: 21850644 (ipc=615.5) sim_rate=273133 (inst/sec) elapsed = 0:0:01:20 / Mon Jun 14 16:18:07 2021
GPGPU-Sim PTX: 22000000 instructions simulated : ctaid=(5,2,0) tid=(9,9,0)
GPGPU-Sim PTX: 22100000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 22200000 instructions simulated : ctaid=(11,9,0) tid=(7,4,0)
GPGPU-Sim uArch: cycles simulated: 36000  inst.: 22180956 (ipc=616.1) sim_rate=273838 (inst/sec) elapsed = 0:0:01:21 / Mon Jun 14 16:18:08 2021
GPGPU-Sim PTX: 22300000 instructions simulated : ctaid=(2,1,0) tid=(5,3,0)
GPGPU-Sim PTX: 22400000 instructions simulated : ctaid=(1,5,0) tid=(5,3,0)
GPGPU-Sim PTX: 22500000 instructions simulated : ctaid=(2,4,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 36500  inst.: 22532600 (ipc=617.3) sim_rate=274787 (inst/sec) elapsed = 0:0:01:22 / Mon Jun 14 16:18:09 2021
GPGPU-Sim PTX: 22600000 instructions simulated : ctaid=(14,9,0) tid=(1,7,0)
GPGPU-Sim PTX: 22700000 instructions simulated : ctaid=(2,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 22800000 instructions simulated : ctaid=(5,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 22900000 instructions simulated : ctaid=(4,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 37000  inst.: 22872968 (ipc=618.2) sim_rate=275577 (inst/sec) elapsed = 0:0:01:23 / Mon Jun 14 16:18:10 2021
GPGPU-Sim PTX: 23000000 instructions simulated : ctaid=(13,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 23100000 instructions simulated : ctaid=(6,3,0) tid=(3,0,0)
GPGPU-Sim PTX: 23200000 instructions simulated : ctaid=(3,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 37500  inst.: 23195320 (ipc=618.5) sim_rate=276134 (inst/sec) elapsed = 0:0:01:24 / Mon Jun 14 16:18:11 2021
GPGPU-Sim PTX: 23300000 instructions simulated : ctaid=(8,6,0) tid=(9,1,0)
GPGPU-Sim PTX: 23400000 instructions simulated : ctaid=(1,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 23500000 instructions simulated : ctaid=(0,10,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 38000  inst.: 23505996 (ipc=618.6) sim_rate=276541 (inst/sec) elapsed = 0:0:01:25 / Mon Jun 14 16:18:12 2021
GPGPU-Sim PTX: 23600000 instructions simulated : ctaid=(5,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 23700000 instructions simulated : ctaid=(13,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 23800000 instructions simulated : ctaid=(1,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 38500  inst.: 23837924 (ipc=619.2) sim_rate=277185 (inst/sec) elapsed = 0:0:01:26 / Mon Jun 14 16:18:13 2021
GPGPU-Sim PTX: 23900000 instructions simulated : ctaid=(4,3,0) tid=(5,1,0)
GPGPU-Sim PTX: 24000000 instructions simulated : ctaid=(13,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24100000 instructions simulated : ctaid=(6,1,0) tid=(9,7,0)
GPGPU-Sim PTX: 24200000 instructions simulated : ctaid=(5,11,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 39000  inst.: 24174400 (ipc=619.9) sim_rate=277866 (inst/sec) elapsed = 0:0:01:27 / Mon Jun 14 16:18:14 2021
GPGPU-Sim PTX: 24300000 instructions simulated : ctaid=(13,6,0) tid=(7,2,0)
GPGPU-Sim PTX: 24400000 instructions simulated : ctaid=(5,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 24500000 instructions simulated : ctaid=(10,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 39500  inst.: 24517452 (ipc=620.7) sim_rate=275476 (inst/sec) elapsed = 0:0:01:29 / Mon Jun 14 16:18:16 2021
GPGPU-Sim PTX: 24600000 instructions simulated : ctaid=(7,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24700000 instructions simulated : ctaid=(0,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 24800000 instructions simulated : ctaid=(8,8,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 40000  inst.: 24838280 (ipc=621.0) sim_rate=275980 (inst/sec) elapsed = 0:0:01:30 / Mon Jun 14 16:18:17 2021
GPGPU-Sim PTX: 24900000 instructions simulated : ctaid=(13,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 25000000 instructions simulated : ctaid=(10,0,0) tid=(3,4,0)
GPGPU-Sim PTX: 25100000 instructions simulated : ctaid=(4,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 25200000 instructions simulated : ctaid=(11,13,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 40500  inst.: 25182424 (ipc=621.8) sim_rate=276729 (inst/sec) elapsed = 0:0:01:31 / Mon Jun 14 16:18:18 2021
GPGPU-Sim PTX: 25300000 instructions simulated : ctaid=(14,12,0) tid=(1,1,0)
GPGPU-Sim PTX: 25400000 instructions simulated : ctaid=(4,3,0) tid=(1,1,0)
GPGPU-Sim PTX: 25500000 instructions simulated : ctaid=(6,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 41000  inst.: 25501984 (ipc=622.0) sim_rate=277195 (inst/sec) elapsed = 0:0:01:32 / Mon Jun 14 16:18:19 2021
GPGPU-Sim PTX: 25600000 instructions simulated : ctaid=(14,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 25700000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 25800000 instructions simulated : ctaid=(1,8,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 41500  inst.: 25839444 (ipc=622.6) sim_rate=277843 (inst/sec) elapsed = 0:0:01:33 / Mon Jun 14 16:18:20 2021
GPGPU-Sim PTX: 25900000 instructions simulated : ctaid=(13,4,0) tid=(7,0,0)
GPGPU-Sim PTX: 26000000 instructions simulated : ctaid=(12,14,0) tid=(5,5,0)
GPGPU-Sim PTX: 26100000 instructions simulated : ctaid=(8,14,0) tid=(1,7,0)
GPGPU-Sim PTX: 26200000 instructions simulated : ctaid=(8,0,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 42000  inst.: 26158016 (ipc=622.8) sim_rate=278276 (inst/sec) elapsed = 0:0:01:34 / Mon Jun 14 16:18:21 2021
GPGPU-Sim PTX: 26300000 instructions simulated : ctaid=(6,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 26400000 instructions simulated : ctaid=(4,6,0) tid=(9,9,0)
GPGPU-Sim PTX: 26500000 instructions simulated : ctaid=(7,14,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 42500  inst.: 26488808 (ipc=623.3) sim_rate=278829 (inst/sec) elapsed = 0:0:01:35 / Mon Jun 14 16:18:22 2021
GPGPU-Sim PTX: 26600000 instructions simulated : ctaid=(0,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 26700000 instructions simulated : ctaid=(13,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 26800000 instructions simulated : ctaid=(14,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 43000  inst.: 26799472 (ipc=623.2) sim_rate=279161 (inst/sec) elapsed = 0:0:01:36 / Mon Jun 14 16:18:23 2021
GPGPU-Sim PTX: 26900000 instructions simulated : ctaid=(14,5,0) tid=(1,9,0)
GPGPU-Sim PTX: 27000000 instructions simulated : ctaid=(9,8,0) tid=(5,7,0)
GPGPU-Sim PTX: 27100000 instructions simulated : ctaid=(3,1,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 43500  inst.: 27137944 (ipc=623.9) sim_rate=276917 (inst/sec) elapsed = 0:0:01:38 / Mon Jun 14 16:18:25 2021
GPGPU-Sim PTX: 27200000 instructions simulated : ctaid=(5,13,0) tid=(7,0,0)
GPGPU-Sim PTX: 27300000 instructions simulated : ctaid=(11,8,0) tid=(7,6,0)
GPGPU-Sim PTX: 27400000 instructions simulated : ctaid=(8,1,0) tid=(9,3,0)
GPGPU-Sim PTX: 27500000 instructions simulated : ctaid=(10,4,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 44000  inst.: 27461176 (ipc=624.1) sim_rate=277385 (inst/sec) elapsed = 0:0:01:39 / Mon Jun 14 16:18:26 2021
GPGPU-Sim PTX: 27600000 instructions simulated : ctaid=(3,4,0) tid=(5,5,0)
GPGPU-Sim PTX: 27700000 instructions simulated : ctaid=(6,2,0) tid=(3,2,0)
GPGPU-Sim PTX: 27800000 instructions simulated : ctaid=(0,8,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 44500  inst.: 27822080 (ipc=625.2) sim_rate=278220 (inst/sec) elapsed = 0:0:01:40 / Mon Jun 14 16:18:27 2021
GPGPU-Sim PTX: 27900000 instructions simulated : ctaid=(6,10,0) tid=(5,7,0)
GPGPU-Sim PTX: 28000000 instructions simulated : ctaid=(9,6,0) tid=(1,3,0)
GPGPU-Sim PTX: 28100000 instructions simulated : ctaid=(10,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 45000  inst.: 28146700 (ipc=625.5) sim_rate=278680 (inst/sec) elapsed = 0:0:01:41 / Mon Jun 14 16:18:28 2021
GPGPU-Sim PTX: 28200000 instructions simulated : ctaid=(9,2,0) tid=(1,1,0)
GPGPU-Sim PTX: 28300000 instructions simulated : ctaid=(10,6,0) tid=(5,5,0)
GPGPU-Sim PTX: 28400000 instructions simulated : ctaid=(2,0,0) tid=(1,5,0)
GPGPU-Sim PTX: 28500000 instructions simulated : ctaid=(4,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 45500  inst.: 28480056 (ipc=625.9) sim_rate=279216 (inst/sec) elapsed = 0:0:01:42 / Mon Jun 14 16:18:29 2021
GPGPU-Sim PTX: 28600000 instructions simulated : ctaid=(8,11,0) tid=(7,4,0)
GPGPU-Sim PTX: 28700000 instructions simulated : ctaid=(13,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 28800000 instructions simulated : ctaid=(8,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 46000  inst.: 28803560 (ipc=626.2) sim_rate=279646 (inst/sec) elapsed = 0:0:01:43 / Mon Jun 14 16:18:30 2021
GPGPU-Sim PTX: 28900000 instructions simulated : ctaid=(14,4,0) tid=(1,9,0)
GPGPU-Sim PTX: 29000000 instructions simulated : ctaid=(7,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 29100000 instructions simulated : ctaid=(11,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 46500  inst.: 29129700 (ipc=626.4) sim_rate=280093 (inst/sec) elapsed = 0:0:01:44 / Mon Jun 14 16:18:31 2021
GPGPU-Sim PTX: 29200000 instructions simulated : ctaid=(2,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 29300000 instructions simulated : ctaid=(14,6,0) tid=(3,8,0)
GPGPU-Sim PTX: 29400000 instructions simulated : ctaid=(12,14,0) tid=(5,1,0)
GPGPU-Sim PTX: 29500000 instructions simulated : ctaid=(8,14,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 47000  inst.: 29456612 (ipc=626.7) sim_rate=280539 (inst/sec) elapsed = 0:0:01:45 / Mon Jun 14 16:18:32 2021
GPGPU-Sim PTX: 29600000 instructions simulated : ctaid=(12,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 29700000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 29800000 instructions simulated : ctaid=(7,13,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 47500  inst.: 29786252 (ipc=627.1) sim_rate=281002 (inst/sec) elapsed = 0:0:01:46 / Mon Jun 14 16:18:33 2021
GPGPU-Sim PTX: 29900000 instructions simulated : ctaid=(5,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 30000000 instructions simulated : ctaid=(12,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 30100000 instructions simulated : ctaid=(7,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 48000  inst.: 30128620 (ipc=627.7) sim_rate=278968 (inst/sec) elapsed = 0:0:01:48 / Mon Jun 14 16:18:35 2021
GPGPU-Sim PTX: 30200000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 30300000 instructions simulated : ctaid=(5,5,0) tid=(5,1,0)
GPGPU-Sim PTX: 30400000 instructions simulated : ctaid=(8,2,0) tid=(5,5,0)
GPGPU-Sim PTX: 30500000 instructions simulated : ctaid=(5,7,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 48500  inst.: 30458644 (ipc=628.0) sim_rate=279437 (inst/sec) elapsed = 0:0:01:49 / Mon Jun 14 16:18:36 2021
GPGPU-Sim PTX: 30600000 instructions simulated : ctaid=(8,9,0) tid=(9,3,0)
GPGPU-Sim PTX: 30700000 instructions simulated : ctaid=(11,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 30800000 instructions simulated : ctaid=(14,14,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 49000  inst.: 30791028 (ipc=628.4) sim_rate=279918 (inst/sec) elapsed = 0:0:01:50 / Mon Jun 14 16:18:37 2021
GPGPU-Sim PTX: 30900000 instructions simulated : ctaid=(1,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 31000000 instructions simulated : ctaid=(5,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 31100000 instructions simulated : ctaid=(14,8,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 49500  inst.: 31060536 (ipc=627.5) sim_rate=279824 (inst/sec) elapsed = 0:0:01:51 / Mon Jun 14 16:18:38 2021
GPGPU-Sim uArch: cycles simulated: 50000  inst.: 31139460 (ipc=622.8) sim_rate=278030 (inst/sec) elapsed = 0:0:01:52 / Mon Jun 14 16:18:39 2021
GPGPU-Sim PTX: 31200000 instructions simulated : ctaid=(11,12,0) tid=(1,5,0)
GPGPU-Sim uArch: Shader 52 finished CTA #0 (50169,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #0 (50210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 52 finished CTA #1 (50232,0), 0 CTAs running
GPGPU-Sim uArch: Shader 52 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 28 finished CTA #0 (50379,0), 1 CTAs running
GPGPU-Sim uArch: Shader 42 finished CTA #0 (50398,0), 1 CTAs running
GPGPU-Sim uArch: Shader 57 finished CTA #0 (50496,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 50500  inst.: 31175264 (ipc=617.3) sim_rate=275887 (inst/sec) elapsed = 0:0:01:53 / Mon Jun 14 16:18:40 2021
GPGPU-Sim uArch: Shader 65 finished CTA #0 (50597,0), 1 CTAs running
GPGPU-Sim uArch: Shader 104 finished CTA #0 (50638,0), 1 CTAs running
GPGPU-Sim uArch: Shader 30 finished CTA #1 (50665,0), 1 CTAs running
GPGPU-Sim uArch: Shader 112 finished CTA #0 (50754,0), 1 CTAs running
GPGPU-Sim uArch: Shader 28 finished CTA #1 (50810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 28 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 57 finished CTA #1 (50829,0), 0 CTAs running
GPGPU-Sim uArch: Shader 57 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 75 finished CTA #0 (50837,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #0 (50839,0), 1 CTAs running
GPGPU-Sim uArch: Shader 60 finished CTA #0 (50843,0), 1 CTAs running
GPGPU-Sim uArch: Shader 47 finished CTA #0 (50845,0), 0 CTAs running
GPGPU-Sim uArch: Shader 47 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #0 (50861,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #1 (50862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 9 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #1 (50894,0), 0 CTAs running
GPGPU-Sim uArch: Shader 88 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 112 finished CTA #1 (50899,0), 0 CTAs running
GPGPU-Sim uArch: Shader 112 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 87 finished CTA #0 (50903,0), 0 CTAs running
GPGPU-Sim uArch: Shader 87 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 60 finished CTA #1 (50905,0), 0 CTAs running
GPGPU-Sim uArch: Shader 60 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 55 finished CTA #0 (50915,0), 0 CTAs running
GPGPU-Sim uArch: Shader 55 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 30 finished CTA #0 (50933,0), 0 CTAs running
GPGPU-Sim uArch: Shader 30 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 32 finished CTA #0 (50948,0), 1 CTAs running
GPGPU-Sim uArch: Shader 39 finished CTA #0 (50953,0), 0 CTAs running
GPGPU-Sim uArch: Shader 39 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 73 finished CTA #0 (50959,0), 1 CTAs running
GPGPU-Sim uArch: Shader 114 finished CTA #0 (50971,0), 1 CTAs running
GPGPU-Sim uArch: Shader 80 finished CTA #0 (50985,0), 1 CTAs running
GPGPU-Sim uArch: Shader 0 finished CTA #0 (50990,0), 1 CTAs running
GPGPU-Sim uArch: Shader 29 finished CTA #1 (51027,0), 1 CTAs running
GPGPU-Sim uArch: Shader 23 finished CTA #0 (51028,0), 0 CTAs running
GPGPU-Sim uArch: Shader 23 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #0 (51028,0), 1 CTAs running
GPGPU-Sim uArch: Shader 63 finished CTA #0 (51035,0), 0 CTAs running
GPGPU-Sim uArch: Shader 63 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 29 finished CTA #0 (51056,0), 0 CTAs running
GPGPU-Sim uArch: Shader 29 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #0 (51059,0), 1 CTAs running
GPGPU-Sim uArch: Shader 32 finished CTA #1 (51065,0), 0 CTAs running
GPGPU-Sim uArch: Shader 32 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 17 finished CTA #0 (51066,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #0 (51082,0), 1 CTAs running
GPGPU-Sim uArch: Shader 73 finished CTA #1 (51111,0), 0 CTAs running
GPGPU-Sim uArch: Shader 73 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 89 finished CTA #1 (51114,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #0 (51119,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #0 (51138,0), 1 CTAs running
GPGPU-Sim uArch: Shader 94 finished CTA #0 (51144,0), 1 CTAs running
GPGPU-Sim uArch: Shader 17 finished CTA #1 (51149,0), 0 CTAs running
GPGPU-Sim uArch: Shader 17 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #0 (51157,0), 1 CTAs running
GPGPU-Sim uArch: Shader 109 finished CTA #0 (51193,0), 1 CTAs running
GPGPU-Sim uArch: Shader 75 finished CTA #1 (51194,0), 0 CTAs running
GPGPU-Sim uArch: Shader 75 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 90 finished CTA #0 (51200,0), 1 CTAs running
GPGPU-Sim uArch: Shader 15 finished CTA #0 (51206,0), 0 CTAs running
GPGPU-Sim uArch: Shader 15 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 26 finished CTA #0 (51206,0), 1 CTAs running
GPGPU-Sim uArch: Shader 72 finished CTA #0 (51210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 89 finished CTA #0 (51216,0), 0 CTAs running
GPGPU-Sim uArch: Shader 89 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 41 finished CTA #0 (51219,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #0 (51227,0), 1 CTAs running
GPGPU-Sim uArch: Shader 53 finished CTA #1 (51230,0), 1 CTAs running
GPGPU-Sim uArch: Shader 37 finished CTA #0 (51232,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #0 (51241,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #0 (51243,0), 1 CTAs running
GPGPU-Sim uArch: Shader 65 finished CTA #1 (51243,0), 0 CTAs running
GPGPU-Sim uArch: Shader 65 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 10 finished CTA #0 (51255,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #0 (51260,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #1 (51266,0), 0 CTAs running
GPGPU-Sim uArch: Shader 96 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 66 finished CTA #0 (51293,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #0 (51296,0), 1 CTAs running
GPGPU-Sim uArch: Shader 62 finished CTA #1 (51304,0), 1 CTAs running
GPGPU-Sim uArch: Shader 41 finished CTA #1 (51308,0), 0 CTAs running
GPGPU-Sim uArch: Shader 41 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 114 finished CTA #1 (51320,0), 0 CTAs running
GPGPU-Sim uArch: Shader 114 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 71 finished CTA #0 (51330,0), 0 CTAs running
GPGPU-Sim uArch: Shader 71 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 61 finished CTA #0 (51340,0), 1 CTAs running
GPGPU-Sim uArch: Shader 58 finished CTA #1 (51345,0), 1 CTAs running
GPGPU-Sim uArch: Shader 103 finished CTA #0 (51352,0), 0 CTAs running
GPGPU-Sim uArch: Shader 103 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 80 finished CTA #1 (51361,0), 0 CTAs running
GPGPU-Sim uArch: Shader 80 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 72 finished CTA #1 (51366,0), 0 CTAs running
GPGPU-Sim uArch: Shader 72 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 0 finished CTA #1 (51368,0), 0 CTAs running
GPGPU-Sim uArch: Shader 0 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #1 (51383,0), 0 CTAs running
GPGPU-Sim uArch: Shader 49 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #0 (51386,0), 1 CTAs running
GPGPU-Sim uArch: Shader 26 finished CTA #1 (51391,0), 0 CTAs running
GPGPU-Sim uArch: Shader 26 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 94 finished CTA #1 (51399,0), 0 CTAs running
GPGPU-Sim uArch: Shader 94 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #0 (51400,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #1 (51406,0), 0 CTAs running
GPGPU-Sim uArch: Shader 2 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 62 finished CTA #0 (51412,0), 0 CTAs running
GPGPU-Sim uArch: Shader 62 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #0 (51415,0), 1 CTAs running
GPGPU-Sim uArch: Shader 90 finished CTA #1 (51420,0), 0 CTAs running
GPGPU-Sim uArch: Shader 90 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #0 (51425,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #1 (51430,0), 0 CTAs running
GPGPU-Sim uArch: Shader 16 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #0 (51434,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #0 (51436,0), 1 CTAs running
GPGPU-Sim uArch: Shader 98 finished CTA #0 (51438,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #1 (51439,0), 0 CTAs running
GPGPU-Sim uArch: Shader 81 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 31 finished CTA #0 (51446,0), 0 CTAs running
GPGPU-Sim uArch: Shader 31 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #1 (51452,0), 0 CTAs running
GPGPU-Sim uArch: Shader 91 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 82 finished CTA #0 (51455,0), 1 CTAs running
GPGPU-Sim uArch: Shader 8 finished CTA #0 (51456,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #1 (51461,0), 0 CTAs running
GPGPU-Sim uArch: Shader 56 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 95 finished CTA #0 (51462,0), 0 CTAs running
GPGPU-Sim uArch: Shader 95 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 58 finished CTA #0 (51482,0), 0 CTAs running
GPGPU-Sim uArch: Shader 58 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 42 finished CTA #1 (51487,0), 0 CTAs running
GPGPU-Sim uArch: Shader 42 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #1 (51492,0), 0 CTAs running
GPGPU-Sim uArch: Shader 86 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #1 (51494,0), 0 CTAs running
GPGPU-Sim uArch: Shader 13 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: cycles simulated: 51500  inst.: 31182868 (ipc=605.5) sim_rate=273533 (inst/sec) elapsed = 0:0:01:54 / Mon Jun 14 16:18:41 2021
GPGPU-Sim uArch: Shader 64 finished CTA #1 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 107 finished CTA #0 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #1 (51503,0), 1 CTAs running
GPGPU-Sim uArch: Shader 10 finished CTA #1 (51508,0), 0 CTAs running
GPGPU-Sim uArch: Shader 10 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #0 (51509,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #1 (51516,0), 1 CTAs running
GPGPU-Sim uArch: Shader 82 finished CTA #1 (51518,0), 0 CTAs running
GPGPU-Sim uArch: Shader 82 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #0 (51528,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #0 (51530,0), 0 CTAs running
GPGPU-Sim uArch: Shader 99 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 107 finished CTA #1 (51533,0), 0 CTAs running
GPGPU-Sim uArch: Shader 107 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #0 (51536,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #1 (51538,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #1 (51549,0), 0 CTAs running
GPGPU-Sim uArch: Shader 106 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 18 finished CTA #0 (51553,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #0 (51559,0), 1 CTAs running
GPGPU-Sim uArch: Shader 61 finished CTA #1 (51560,0), 0 CTAs running
GPGPU-Sim uArch: Shader 61 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 40 finished CTA #1 (51569,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #0 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 76 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #1 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 83 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #1 (51588,0), 0 CTAs running
GPGPU-Sim uArch: Shader 25 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #0 (51593,0), 1 CTAs running
GPGPU-Sim uArch: Shader 27 finished CTA #0 (51594,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #0 (51602,0), 0 CTAs running
GPGPU-Sim uArch: Shader 33 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #1 (51614,0), 1 CTAs running
GPGPU-Sim uArch: Shader 119 finished CTA #0 (51624,0), 0 CTAs running
GPGPU-Sim uArch: Shader 119 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #0 (51626,0), 1 CTAs running
GPGPU-Sim uArch: Shader 84 finished CTA #0 (51628,0), 1 CTAs running
GPGPU-Sim uArch: Shader 97 finished CTA #0 (51634,0), 1 CTAs running
GPGPU-Sim uArch: Shader 18 finished CTA #1 (51639,0), 0 CTAs running
GPGPU-Sim uArch: Shader 18 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 93 finished CTA #0 (51640,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #0 (51650,0), 1 CTAs running
GPGPU-Sim uArch: Shader 22 finished CTA #0 (51652,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #0 (51656,0), 1 CTAs running
GPGPU-Sim uArch: Shader 108 finished CTA #0 (51657,0), 1 CTAs running
GPGPU-Sim uArch: Shader 6 finished CTA #0 (51660,0), 1 CTAs running
GPGPU-Sim uArch: Shader 40 finished CTA #0 (51662,0), 0 CTAs running
GPGPU-Sim uArch: Shader 40 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 84 finished CTA #1 (51666,0), 0 CTAs running
GPGPU-Sim uArch: Shader 84 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #1 (51668,0), 0 CTAs running
GPGPU-Sim uArch: Shader 92 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #0 (51668,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #1 (51670,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #0 (51674,0), 0 CTAs running
GPGPU-Sim uArch: Shader 24 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #1 (51678,0), 0 CTAs running
GPGPU-Sim uArch: Shader 3 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #1 (51683,0), 0 CTAs running
GPGPU-Sim uArch: Shader 59 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 8 finished CTA #1 (51687,0), 0 CTAs running
GPGPU-Sim uArch: Shader 8 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #1 (51688,0), 0 CTAs running
GPGPU-Sim uArch: Shader 115 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #0 (51689,0), 1 CTAs running
GPGPU-Sim uArch: Shader 111 finished CTA #0 (51699,0), 0 CTAs running
GPGPU-Sim uArch: Shader 111 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #0 (51699,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #1 (51701,0), 0 CTAs running
GPGPU-Sim uArch: Shader 34 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 54 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 104 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 104 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #0 (51730,0), 0 CTAs running
GPGPU-Sim uArch: Shader 48 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #1 (51739,0), 0 CTAs running
GPGPU-Sim uArch: Shader 20 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #1 (51741,0), 0 CTAs running
GPGPU-Sim uArch: Shader 11 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 97 finished CTA #1 (51748,0), 0 CTAs running
GPGPU-Sim uArch: Shader 97 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 113 finished CTA #0 (51749,0), 1 CTAs running
GPGPU-Sim uArch: Shader 113 finished CTA #1 (51751,0), 0 CTAs running
GPGPU-Sim uArch: Shader 113 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 27 finished CTA #1 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 27 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 53 finished CTA #0 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 53 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #0 (51756,0), 1 CTAs running
GPGPU-Sim uArch: Shader 43 finished CTA #0 (51767,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #1 (51771,0), 0 CTAs running
GPGPU-Sim uArch: Shader 36 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #1 (51776,0), 0 CTAs running
GPGPU-Sim uArch: Shader 105 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #0 (51783,0), 1 CTAs running
GPGPU-Sim uArch: Shader 93 finished CTA #1 (51788,0), 0 CTAs running
GPGPU-Sim uArch: Shader 93 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #0 (51790,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #1 (51794,0), 0 CTAs running
GPGPU-Sim uArch: Shader 50 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #1 (51796,0), 0 CTAs running
GPGPU-Sim uArch: Shader 67 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 7 finished CTA #0 (51797,0), 0 CTAs running
GPGPU-Sim uArch: Shader 7 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 19 finished CTA #0 (51797,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #1 (51800,0), 0 CTAs running
GPGPU-Sim uArch: Shader 74 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 43 finished CTA #1 (51801,0), 0 CTAs running
GPGPU-Sim uArch: Shader 43 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #1 (51802,0), 0 CTAs running
GPGPU-Sim uArch: Shader 102 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 79 finished CTA #0 (51810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 79 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 98 finished CTA #1 (51811,0), 0 CTAs running
GPGPU-Sim uArch: Shader 98 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #1 (51813,0), 0 CTAs running
GPGPU-Sim uArch: Shader 85 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 51 finished CTA #0 (51824,0), 1 CTAs running
GPGPU-Sim uArch: Shader 66 finished CTA #1 (51833,0), 0 CTAs running
GPGPU-Sim uArch: Shader 66 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #0 (51840,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #1 (51840,0), 0 CTAs running
GPGPU-Sim uArch: Shader 35 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 44 finished CTA #0 (51841,0), 1 CTAs running
GPGPU-Sim uArch: Shader 101 finished CTA #1 (51842,0), 1 CTAs running
GPGPU-Sim uArch: Shader 64 finished CTA #0 (51847,0), 0 CTAs running
GPGPU-Sim uArch: Shader 64 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 37 finished CTA #1 (51851,0), 0 CTAs running
GPGPU-Sim uArch: Shader 37 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 77 finished CTA #0 (51851,0), 1 CTAs running
GPGPU-Sim uArch: Shader 51 finished CTA #1 (51855,0), 0 CTAs running
GPGPU-Sim uArch: Shader 51 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 109 finished CTA #1 (51857,0), 0 CTAs running
GPGPU-Sim uArch: Shader 109 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #1 (51860,0), 0 CTAs running
GPGPU-Sim uArch: Shader 12 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 6 finished CTA #1 (51862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 6 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #1 (51867,0), 0 CTAs running
GPGPU-Sim uArch: Shader 116 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 101 finished CTA #0 (51870,0), 0 CTAs running
GPGPU-Sim uArch: Shader 101 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #1 (51871,0), 1 CTAs running
GPGPU-Sim uArch: Shader 38 finished CTA #0 (51881,0), 1 CTAs running
GPGPU-Sim uArch: Shader 19 finished CTA #1 (51882,0), 0 CTAs running
GPGPU-Sim uArch: Shader 19 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 38 finished CTA #1 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 38 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #0 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 117 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #0 (51902,0), 1 CTAs running
GPGPU-Sim uArch: Shader 44 finished CTA #1 (51906,0), 0 CTAs running
GPGPU-Sim uArch: Shader 44 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #0 (51907,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #1 (51908,0), 0 CTAs running
GPGPU-Sim uArch: Shader 21 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #1 (51913,0), 0 CTAs running
GPGPU-Sim uArch: Shader 14 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #1 (51923,0), 0 CTAs running
GPGPU-Sim uArch: Shader 1 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 22 finished CTA #1 (51928,0), 0 CTAs running
GPGPU-Sim uArch: Shader 22 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #0 (51933,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 110 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #1 (51944,0), 0 CTAs running
GPGPU-Sim uArch: Shader 78 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 108 finished CTA #1 (51951,0), 0 CTAs running
GPGPU-Sim uArch: Shader 108 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #1 (51964,0), 0 CTAs running
GPGPU-Sim uArch: Shader 45 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #0 (51966,0), 1 CTAs running
GPGPU-Sim uArch: Shader 77 finished CTA #1 (51966,0), 0 CTAs running
GPGPU-Sim uArch: Shader 77 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #1 (51969,0), 0 CTAs running
GPGPU-Sim uArch: Shader 68 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #0 (51986,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 52000  inst.: 31185000 (ipc=599.7) sim_rate=271173 (inst/sec) elapsed = 0:0:01:55 / Mon Jun 14 16:18:42 2021
GPGPU-Sim uArch: Shader 70 finished CTA #0 (52000,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #0 (52004,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #1 (52012,0), 0 CTAs running
GPGPU-Sim uArch: Shader 4 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #1 (52016,0), 0 CTAs running
GPGPU-Sim uArch: Shader 100 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 110 finished CTA #1 (52020,0), 0 CTAs running
GPGPU-Sim uArch: Shader 110 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #1 (52024,0), 0 CTAs running
GPGPU-Sim uArch: Shader 118 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 5 finished CTA #0 (52044,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #1 (52044,0), 0 CTAs running
GPGPU-Sim uArch: Shader 46 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #0 (52048,0), 1 CTAs running
GPGPU-Sim uArch: Shader 5 finished CTA #1 (52061,0), 0 CTAs running
GPGPU-Sim uArch: Shader 5 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #1 (52107,0), 0 CTAs running
GPGPU-Sim uArch: Shader 69 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 70 finished CTA #1 (52113,0), 0 CTAs running
GPGPU-Sim uArch: Shader 70 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: GPU detected kernel '_Z14matrix_mul_gpuPiS_S_i' finished on shader 70.
kernel_name = _Z14matrix_mul_gpuPiS_S_i 
kernel_launch_uid = 1 
gpu_sim_cycle = 52114
gpu_sim_insn = 31185000
gpu_ipc =     598.3997
gpu_tot_sim_cycle = 52114
gpu_tot_sim_insn = 31185000
gpu_tot_ipc =     598.3997
gpu_tot_issued_cta = 0
gpu_stall_dramfull = 74963
gpu_stall_icnt2sh    = 150961
gpu_total_sim_rate=271173

========= Core cache stats =========
L1I_cache:
	L1I_total_cache_accesses = 696598
	L1I_total_cache_misses = 3598
	L1I_total_cache_miss_rate = 0.0052
	L1I_total_cache_pending_hits = 0
	L1I_total_cache_reservation_fails = 0
L1D_cache:
	L1D_cache_core[0]: Access = 40337, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9587, Reservation_fails = 17954
	L1D_cache_core[1]: Access = 40291, Miss = 2547, Miss_rate = 0.063, Pending_hits = 9564, Reservation_fails = 17242
	L1D_cache_core[2]: Access = 40297, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9568, Reservation_fails = 19637
	L1D_cache_core[3]: Access = 40231, Miss = 2551, Miss_rate = 0.063, Pending_hits = 9512, Reservation_fails = 20346
	L1D_cache_core[4]: Access = 40282, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9553, Reservation_fails = 20125
	L1D_cache_core[5]: Access = 40276, Miss = 2563, Miss_rate = 0.064, Pending_hits = 9550, Reservation_fails = 21507
	L1D_cache_core[6]: Access = 40282, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9549, Reservation_fails = 17583
	L1D_cache_core[7]: Access = 40322, Miss = 2555, Miss_rate = 0.063, Pending_hits = 9582, Reservation_fails = 19592
	L1D_cache_core[8]: Access = 40328, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9572, Reservation_fails = 18611
	L1D_cache_core[9]: Access = 40323, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9579, Reservation_fails = 17753
	L1D_cache_core[10]: Access = 40328, Miss = 2568, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 19928
	L1D_cache_core[11]: Access = 40322, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9591, Reservation_fails = 18029
	L1D_cache_core[12]: Access = 40328, Miss = 2580, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 17457
	L1D_cache_core[13]: Access = 40338, Miss = 2572, Miss_rate = 0.064, Pending_hits = 9599, Reservation_fails = 17935
	L1D_cache_core[14]: Access = 40343, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9605, Reservation_fails = 16746
	L1D_total_cache_accesses = 604628
	L1D_total_cache_misses = 38436
	L1D_total_cache_miss_rate = 0.0636
	L1D_total_cache_pending_hits = 143587
	L1D_total_cache_reservation_fails = 280445
	L1D_cache_data_port_util = 0.068
	L1D_cache_fill_port_util = 0.006
L1C_cache:
	L1C_total_cache_accesses = 3600
	L1C_total_cache_misses = 900
	L1C_total_cache_miss_rate = 0.2500
	L1C_total_cache_pending_hits = 0
	L1C_total_cache_reservation_fails = 0
L1T_cache:
	L1T_total_cache_accesses = 0
	L1T_total_cache_misses = 0
	L1T_total_cache_pending_hits = 0
	L1T_total_cache_reservation_fails = 0

Total_core_cache_stats:
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 422605
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 143587
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 34993
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 135803
	Total_core_cache_stats_breakdown[CONST_ACC_R][HIT] = 2700
	Total_core_cache_stats_breakdown[CONST_ACC_R][MISS] = 900
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 3443
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 144642
	Total_core_cache_stats_breakdown[INST_ACC_R][HIT] = 693000
	Total_core_cache_stats_breakdown[INST_ACC_R][MISS] = 3598
Shader 0 warp_id issue ditsribution:
warp_id:
0, 1, 2, 3, 4, 5, 6, 7, 
distro:
1388, 1388, 1388, 1388, 1388, 1388, 1388, 1388, 
gpgpu_n_tot_thrd_icount = 39974400
gpgpu_n_tot_w_icount = 1249200
gpgpu_n_stall_shd_mem = 614173
gpgpu_n_mem_read_local = 0
gpgpu_n_mem_write_local = 0
gpgpu_n_mem_read_global = 34993
gpgpu_n_mem_write_global = 3443
gpgpu_n_mem_texture = 0
gpgpu_n_mem_const = 120
gpgpu_n_load_insn  = 6750000
gpgpu_n_store_insn = 22500
gpgpu_n_shmem_insn = 0
gpgpu_n_tex_insn = 0
gpgpu_n_const_mem_insn = 0
gpgpu_n_param_mem_insn = 90000
gpgpu_n_shmem_bkconflict = 0
gpgpu_n_cache_bkconflict = 0
gpgpu_n_intrawarp_mshr_merge = 0
gpgpu_n_cmem_portconflict = 0
gpgpu_stall_shd_mem[c_mem][bk_conf] = 0
gpgpu_stall_shd_mem[c_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[c_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[c_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[t_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[t_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[t_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[s_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][coal_stall] = 614173
gpgpu_stall_shd_mem[gl_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[g_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[g_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpu_reg_bank_conflict_stalls = 0
Warp Occupancy Distribution:
Stall:441805	W0_Idle:1288598	W0_Scoreboard:9482477	W1:0	W2:0	W3:0	W4:312300	W5:0	W6:0	W7:0	W8:0	W9:0	W10:0	W11:0	W12:0	W13:0	W14:0	W15:0	W16:0	W17:0	W18:0	W19:0	W20:0	W21:0	W22:0	W23:0	W24:0	W25:0	W26:0	W27:0	W28:0	W29:0	W30:0	W31:0	W32:936900
traffic_breakdown_coretomem[CONST_ACC_R] = 960 {8:120,}
traffic_breakdown_coretomem[GLOBAL_ACC_R] = 279944 {8:34993,}
traffic_breakdown_coretomem[GLOBAL_ACC_W] = 220472 {40:1891,72:1035,136:517,}
traffic_breakdown_coretomem[INST_ACC_R] = 3840 {8:480,}
traffic_breakdown_memtocore[CONST_ACC_R] = 8640 {72:120,}
traffic_breakdown_memtocore[GLOBAL_ACC_R] = 4759048 {136:34993,}
traffic_breakdown_memtocore[GLOBAL_ACC_W] = 27544 {8:3443,}
traffic_breakdown_memtocore[INST_ACC_R] = 65280 {136:480,}
maxmrqlatency = 264 
maxdqlatency = 0 
maxmflatency = 2966 
averagemflatency = 331 
max_icnt2mem_latency = 3027 
max_icnt2sh_latency = 52113 
mrq_lat_table:1864 	85 	65 	133 	191 	266 	357 	250 	1 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
dq_lat_table:0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_table:0 	0 	0 	0 	0 	0 	0 	18641 	15624 	2440 	1703 	148 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2mem_lat_table:0 	0 	0 	25799 	1252 	2017 	3082 	2408 	1558 	1817 	996 	107 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2sh_lat_table:0 	0 	0 	5374 	26785 	2935 	19 	0 	0 	0 	0 	0 	0 	0 	0 	3443 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_pw_table:0 	0 	0 	0 	0 	0 	0 	80 	12 	6 	4 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
maximum concurrent accesses to same row:
dram[0]:         1         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         2         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
maximum service time to same row:
dram[0]:     48986     50145         0         0      2487      2476      2843      3294      2488      3444     15918     16579     40804     41559     49682     49938 
dram[1]:     47947     50759         0         0      2569      2004      2601      2463      2029      3357     15954     16900     40935     41738     49974     49682 
dram[2]:     50116     49853         0         0      2013      2401      2741      2529      1994      3279     16219     16887     41074     41907     49691     49690 
dram[3]:     49710     50845         0         0      2009      2478      2429      2621      2751      3897     16263     17171     41160     41972     49677     49878 
dram[4]:     50237     49850         0         0      2488      2003      3332      2538      2485      3278     16469     17200     41247     42166     49919     49687 
dram[5]:     49859     50068         0         0      2007      2513      2469      3322      3301      3935     16591     17199     41469     42218     49687     49888 
average row accesses per activate:
dram[0]:  4.250000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 64.000000 72.000000 83.000000 77.000000 
dram[1]:  7.000000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 77.000000 61.000000 82.000000 85.000000 
dram[2]: 14.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 63.000000 86.000000 81.000000 
dram[3]: 13.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 68.000000 75.000000 85.000000 
dram[4]: 15.000000 11.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 68.000000 65.000000 78.000000 88.000000 
dram[5]: 16.000000  8.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 65.000000 67.000000 81.000000 81.000000 
average row locality = 3212/88 = 36.500000
number of total memory accesses made:
dram[0]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
total accesses: 0
min_bank_accesses = 0!
min_chip_accesses = 0!
number of total read accesses:
dram[0]:         9         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[1]:         8         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[2]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[3]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[4]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[5]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
total reads: 2117
min_bank_accesses = 0!
chip skew: 355/352 = 1.01
number of total write accesses:
dram[0]:         8         8         0         0         0         0         0         0         0         0         0         0        32        40        51        45 
dram[1]:         6         8         0         0         0         0         0         0         0         0         0         0        45        29        50        53 
dram[2]:         8         5         0         0         0         0         0         0         0         0         0         0        31        31        54        49 
dram[3]:         7         5         0         0         0         0         0         0         0         0         0         0        31        36        43        53 
dram[4]:         9         7         0         0         0         0         0         0         0         0         0         0        36        33        46        56 
dram[5]:        10         4         0         0         0         0         0         0         0         0         0         0        33        35        49        49 
total reads: 1095
min_bank_accesses = 0!
chip skew: 191/175 = 1.09
average mf latency per bank:
dram[0]:       5972      1408    none      none        7624      6015      7869      7128      9772      7109      8243      8142      2583      2276      1126      1269
dram[1]:        832      1099    none      none        4678      6681      8215      6378      8926      6471      8058      8269      2353      2426      1412      1324
dram[2]:       1016       863    none      none        7296      5077      8207      5996      7922      6544      8246      8380      2536      2313      1209      1387
dram[3]:       1202       900    none      none        3901      6944      7219      6985      7134      7113      8836      8254      2245      2250      1322      1180
dram[4]:        954      1031    none      none        7742      4595      6331      6396      6875      6570      8289      8526      2078      2526      1299      1065
dram[5]:       1138      1430    none      none        4617      6835      5760      7269      6703      6460      7953      8525      2352      2337      1205      1207
maximum mf latency per bank:
dram[0]:       1445      1801         0         0      1719      1494      2630      2529      2515      1555       378       394      1663      1710      1771      1766
dram[1]:       1412      1863         0         0      1419      1625      2966      1991      2270      1260       385       394      1617      1737      1882      1992
dram[2]:       1748      1134         0         0      1929      1216      2610      2175      2008      1289       410       388      1659      1761      1974      1745
dram[3]:       1799      1558         0         0       615      1704      2483      1767      1313      1553       427       428      1645      1618      1735      1982
dram[4]:       1846      1376         0         0      1908       939      2307      2711      1561      1338       425       399      1707      1747      1971      1751
dram[5]:       1873      1435         0         0       784      1584      1609      2693      1359      1522       431       406      1760      1622      1768      1979

Number of Memory Banks Accessed per Memory Operation per Warp (from 0):
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Average # of Memory Banks Accessed per Memory Operation per Warp=-nan

position of mrq chosen
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	

average position of mrq chosen = -nan
Memory Partition 0: 
Cache L2_bank_000:
MSHR contents

Cache L2_bank_001:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[0]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67855 n_act=17 n_pre=3 n_req=539 n_rd=710 n_write=205 bw_util=0.0266
n_activity=5774 dram_eff=0.3169
bk0: 18a 68542i bk1: 12a 68532i bk2: 0a 68787i bk3: 0a 68789i bk4: 20a 68735i bk5: 20a 68735i bk6: 64a 68637i bk7: 64a 68637i bk8: 64a 68640i bk9: 64a 68645i bk10: 64a 68647i bk11: 64a 68647i bk12: 64a 67873i bk13: 64a 67623i bk14: 64a 67349i bk15: 64a 67377i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.30346
Memory Partition 1: 
Cache L2_bank_002:
MSHR contents

Cache L2_bank_003:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[1]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67854 n_act=15 n_pre=1 n_req=545 n_rd=708 n_write=212 bw_util=0.02675
n_activity=5886 dram_eff=0.3126
bk0: 16a 68532i bk1: 12a 68588i bk2: 0a 68788i bk3: 0a 68788i bk4: 20a 68735i bk5: 20a 68733i bk6: 64a 68638i bk7: 64a 68644i bk8: 64a 68647i bk9: 64a 68644i bk10: 64a 68645i bk11: 64a 68651i bk12: 64a 67904i bk13: 64a 67987i bk14: 64a 67318i bk15: 64a 67259i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.334918
Memory Partition 2: 
Cache L2_bank_004:
MSHR contents

Cache L2_bank_005:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[2]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67873 n_act=14 n_pre=0 n_req=530 n_rd=704 n_write=199 bw_util=0.02625
n_activity=5889 dram_eff=0.3067
bk0: 12a 68632i bk1: 8a 68675i bk2: 0a 68788i bk3: 0a 68789i bk4: 20a 68736i bk5: 24a 68729i bk6: 64a 68644i bk7: 64a 68646i bk8: 64a 68647i bk9: 64a 68651i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67950i bk13: 64a 67719i bk14: 64a 67369i bk15: 64a 67364i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.253743
Memory Partition 3: 
Cache L2_bank_006:
MSHR contents

Cache L2_bank_007:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[3]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67880 n_act=14 n_pre=0 n_req=527 n_rd=704 n_write=192 bw_util=0.02605
n_activity=5725 dram_eff=0.313
bk0: 12a 68629i bk1: 8a 68661i bk2: 0a 68786i bk3: 0a 68786i bk4: 20a 68738i bk5: 24a 68729i bk6: 64a 68642i bk7: 64a 68643i bk8: 64a 68643i bk9: 64a 68649i bk10: 64a 68638i bk11: 64a 68650i bk12: 64a 67791i bk13: 64a 67660i bk14: 64a 67449i bk15: 64a 67385i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.27991
Memory Partition 4: 
Cache L2_bank_008:
MSHR contents

Cache L2_bank_009:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[4]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67869 n_act=14 n_pre=0 n_req=539 n_rd=704 n_write=203 bw_util=0.02637
n_activity=5871 dram_eff=0.309
bk0: 12a 68578i bk1: 8a 68708i bk2: 0a 68790i bk3: 0a 68790i bk4: 20a 68735i bk5: 24a 68730i bk6: 64a 68649i bk7: 64a 68649i bk8: 64a 68645i bk9: 64a 68641i bk10: 64a 68643i bk11: 64a 68647i bk12: 64a 67633i bk13: 64a 67711i bk14: 64a 67323i bk15: 64a 67376i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.317125
Memory Partition 5: 
Cache L2_bank_010:
MSHR contents

Cache L2_bank_011:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[5]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67872 n_act=14 n_pre=0 n_req=532 n_rd=704 n_write=200 bw_util=0.02628
n_activity=5855 dram_eff=0.3088
bk0: 12a 68631i bk1: 8a 68658i bk2: 0a 68787i bk3: 0a 68787i bk4: 20a 68731i bk5: 24a 68730i bk6: 64a 68647i bk7: 64a 68643i bk8: 64a 68645i bk9: 64a 68645i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67817i bk13: 64a 67748i bk14: 64a 67487i bk15: 64a 67461i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.286524

========= L2 cache stats =========
L2_cache_bank[0]: Access = 3621, Miss = 179, Miss_rate = 0.049, Pending_hits = 321, Reservation_fails = 4452
L2_cache_bank[1]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4350
L2_cache_bank[2]: Access = 3450, Miss = 178, Miss_rate = 0.052, Pending_hits = 315, Reservation_fails = 4369
L2_cache_bank[3]: Access = 3206, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4198
L2_cache_bank[4]: Access = 3229, Miss = 176, Miss_rate = 0.055, Pending_hits = 297, Reservation_fails = 4050
L2_cache_bank[5]: Access = 3178, Miss = 176, Miss_rate = 0.055, Pending_hits = 292, Reservation_fails = 3913
L2_cache_bank[6]: Access = 3180, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4277
L2_cache_bank[7]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 306, Reservation_fails = 3642
L2_cache_bank[8]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 298, Reservation_fails = 3777
L2_cache_bank[9]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 308, Reservation_fails = 4112
L2_cache_bank[10]: Access = 3193, Miss = 176, Miss_rate = 0.055, Pending_hits = 290, Reservation_fails = 3935
L2_cache_bank[11]: Access = 3199, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4179
L2_total_cache_accesses = 39036
L2_total_cache_misses = 2117
L2_total_cache_miss_rate = 0.0542
L2_total_cache_pending_hits = 3621
L2_total_cache_reservation_fails = 49254
L2_total_cache_breakdown:
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 30369
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 3216
	L2_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 1408
	L2_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 48229
	L2_cache_stats_breakdown[CONST_ACC_R][HIT] = 116
	L2_cache_stats_breakdown[CONST_ACC_R][HIT_RESERVED] = 3
	L2_cache_stats_breakdown[CONST_ACC_R][MISS] = 1
	L2_cache_stats_breakdown[CONST_ACC_R][RESERVATION_FAIL] = 129
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT] = 2348
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT_RESERVED] = 391
	L2_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 704
	L2_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 552
	L2_cache_stats_breakdown[INST_ACC_R][HIT] = 465
	L2_cache_stats_breakdown[INST_ACC_R][HIT_RESERVED] = 11
	L2_cache_stats_breakdown[INST_ACC_R][MISS] = 4
	L2_cache_stats_breakdown[INST_ACC_R][RESERVATION_FAIL] = 344
L2_cache_data_port_util = 0.204
L2_cache_fill_port_util = 0.014

icnt_total_pkts_mem_to_simt=181168
icnt_total_pkts_simt_to_mem=45065
LD_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ST_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
----------------------------Interconnect-DETAILS--------------------------------
Class 0:
Packet latency average = 26.5755
	minimum = 6
	maximum = 856
Network latency average = 18.7063
	minimum = 6
	maximum = 797
Slowest packet = 1042
Flit latency average = 13.4921
	minimum = 6
	maximum = 797
Slowest flit = 2494
Fragmentation average = 0.0075187
	minimum = 0
	maximum = 332
Injected packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Accepted packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Injected flit rate average = 0.160782
	minimum = 0.0571056 (at node 1)
	maximum = 0.320317 (at node 15)
Accepted flit rate average= 0.160782
	minimum = 0.0703074 (at node 21)
	maximum = 0.233181 (at node 12)
Injected packet length average = 2.89775
Accepted packet length average = 2.89775
Total in-flight flits = 0 (0 measured)
====== Overall Traffic Statistics ======
====== Traffic class 0 ======
Packet latency average = 26.5755 (1 samples)
	minimum = 6 (1 samples)
	maximum = 856 (1 samples)
Network latency average = 18.7063 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Flit latency average = 13.4921 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Fragmentation average = 0.0075187 (1 samples)
	minimum = 0 (1 samples)
	maximum = 332 (1 samples)
Injected packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Accepted packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Injected flit rate average = 0.160782 (1 samples)
	minimum = 0.0571056 (1 samples)
	maximum = 0.320317 (1 samples)
Accepted flit rate average = 0.160782 (1 samples)
	minimum = 0.0703074 (1 samples)
	maximum = 0.233181 (1 samples)
Injected packet size average = 2.89775 (1 samples)
Accepted packet size average = 2.89775 (1 samples)
Hops average = 1 (1 samples)
----------------------------END-of-Interconnect-DETAILS-------------------------


gpgpu_simulation_time = 0 days, 0 hrs, 1 min, 55 sec (115 sec)
gpgpu_simulation_rate = 271173 (inst/sec)
gpgpu_simulation_rate = 453 (cycle/sec)
total time is 115176 ms


        *** GPGPU-Sim Simulator Version 3.2.2  [build 0] ***


GPGPU-Sim PTX: simulation mode 0 (can change with PTX_SIM_MODE_FUNC environment variable:
               1=functional simulation only, 0=detailed performance simulator)
GPGPU-Sim: Configuration options:

-network_mode                           1 # Interconnection network mode
-inter_config_file   config_fermi_islip.icnt # Interconnection network config file
-gpgpu_ptx_use_cuobjdump                    1 # Use cuobjdump to extract ptx and sass from binaries
-gpgpu_experimental_lib_support                    0 # Try to extract code from cuda libraries [Broken because of unknown cudaGetExportTable]
-gpgpu_ptx_convert_to_ptxplus                    0 # Convert SASS (native ISA) to ptxplus and run ptxplus
-gpgpu_ptx_force_max_capability                   20 # Force maximum compute capability
-gpgpu_ptx_inst_debug_to_file                    0 # Dump executed instructions' debug information to file
-gpgpu_ptx_inst_debug_file       inst_debug.txt # Executed instructions' debug output file
-gpgpu_ptx_inst_debug_thread_uid                    1 # Thread UID for executed instructions' debug output
-gpgpu_simd_model                       1 # 1 = post-dominator
-gpgpu_shader_core_pipeline              1536:32 # shader core pipeline config, i.e., {<nthread>:<warpsize>}
-gpgpu_tex_cache:l1  4:128:24,L:R:m:N:L,F:128:4,128:2 # per-shader L1 texture cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>:<rf>}
-gpgpu_const_cache:l1 64:64:2,L:R:f:N:L,A:2:32,4 # per-shader L1 constant memory cache  (READ-ONLY) config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:il1     4:128:4,L:R:f:N:L,A:2:32,4 # shader L1 instruction cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:dl1     32:128:4,L:L:m:N:H,A:32:8,8 # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PrefL1                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gpgpu_cache:dl1PreShared                 none # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}
-gmem_skip_L1D                          0 # global memory access skip L1D cache (implements -Xptxas -dlcm=cg, default=no skip)
-gpgpu_perfect_mem                      0 # enable perfect memory mode (no cache miss)
-n_regfile_gating_group                    4 # group of lanes that should be read/written together)
-gpgpu_clock_gated_reg_file                    0 # enable clock gated reg file for power calculations
-gpgpu_clock_gated_lanes                    0 # enable clock gated lanes for power calculations
-gpgpu_shader_registers                32768 # Number of registers per shader core. Limits number of concurrent CTAs. (default 8192)
-gpgpu_shader_cta                       8 # Maximum number of concurrent CTAs in shader (default 8)
-gpgpu_num_cta_barriers                   16 # Maximum number of named barriers per CTA (default 16)
-gpgpu_n_clusters                      15 # number of processing clusters
-gpgpu_n_cores_per_cluster                    8 # number of simd cores per cluster
-gpgpu_n_cluster_ejection_buffer_size                    8 # number of packets in ejection buffer
-gpgpu_n_ldst_response_buffer_size                    2 # number of response packets in ld/st unit ejection buffer
-gpgpu_shmem_size                   16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size                   49152 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefL1                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_size_PrefShared                16384 # Size of shared memory per shader core (default 16kB)
-gpgpu_shmem_num_banks                   32 # Number of banks in the shared memory in each shader core (default 16)
-gpgpu_shmem_limited_broadcast                    0 # Limit shared memory to do one broadcast per cycle (default on)
-gpgpu_shmem_warp_parts                    1 # Number of portions a warp is divided into for shared memory bank conflict check 
-gpgpu_warpdistro_shader                   -1 # Specify which shader core to collect the warp size distribution from
-gpgpu_warp_issue_shader                    0 # Specify which shader core to collect the warp issue distribution from
-gpgpu_local_mem_map                    1 # Mapping from local memory space address to simulated GPU physical address space (default = enabled)
-gpgpu_num_reg_banks                   16 # Number of register banks (default = 8)
-gpgpu_reg_bank_use_warp_id                    0 # Use warp ID in mapping registers to banks (default = off)
-gpgpu_operand_collector_num_units_sp                    6 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_sfu                    8 # number of collector units (default = 4)
-gpgpu_operand_collector_num_units_mem                    2 # number of collector units (default = 2)
-gpgpu_operand_collector_num_units_gen                    0 # number of collector units (default = 0)
-gpgpu_operand_collector_num_in_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_in_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_operand_collector_num_out_ports_sp                    2 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_sfu                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_mem                    1 # number of collector unit in ports (default = 1)
-gpgpu_operand_collector_num_out_ports_gen                    0 # number of collector unit in ports (default = 0)
-gpgpu_coalesce_arch                   13 # Coalescing arch (default = 13, anything else is off for now)
-gpgpu_num_sched_per_core                    2 # Number of warp schedulers per core
-gpgpu_max_insn_issue_per_warp                    1 # Max number of instructions that can be issued per warp in one cycle by scheduler
-gpgpu_simt_core_sim_order                    1 # Select the simulation order of cores in a cluster (0=Fix, 1=Round-Robin)
-gpgpu_pipeline_widths        2,1,1,2,1,1,2 # Pipeline widths ID_OC_SP,ID_OC_SFU,ID_OC_MEM,OC_EX_SP,OC_EX_SFU,OC_EX_MEM,EX_WB
-gpgpu_num_sp_units                     2 # Number of SP units (default=1)
-gpgpu_num_sfu_units                    1 # Number of SF units (default=1)
-gpgpu_num_mem_units                    1 # Number if ldst units (default=1) WARNING: not hooked up to anything
-gpgpu_scheduler                      gto # Scheduler configuration: < lrr | gto | two_level_active > If two_level_active:<num_active_warps>:<inner_prioritization>:<outer_prioritization>For complete list of prioritization values see shader.h enum scheduler_prioritization_typeDefault: gto
-gpgpu_dram_scheduler                    1 # 0 = fifo, 1 = FR-FCFS (defaul)
-gpgpu_dram_partition_queues              8:8:8:8 # i2$:$2d:d2$:$2i
-l2_ideal                               0 # Use a ideal L2 cache that always hit
-gpgpu_cache:dl2     64:128:8,L:B:m:W:L,A:32:4,4:0,32 # unified banked L2 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>}
-gpgpu_cache:dl2_texture_only                    0 # L2 cache used for texture only
-gpgpu_n_mem                            6 # number of memory modules (e.g. memory controllers) in gpu
-gpgpu_n_sub_partition_per_mchannel                    2 # number of memory subpartition in each memory module
-gpgpu_n_mem_per_ctrlr                    2 # number of memory chips per memory controller
-gpgpu_memlatency_stat                   14 # track and display latency statistics 0x2 enables MC, 0x4 enables queue logs
-gpgpu_frfcfs_dram_sched_queue_size                   16 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_return_queue_size                  116 # 0 = unlimited (default); # entries per chip
-gpgpu_dram_buswidth                    4 # default = 4 bytes (8 bytes per cycle at DDR)
-gpgpu_dram_burst_length                    8 # Burst length of each DRAM request (default = 4 data bus cycle)
-dram_data_command_freq_ratio                    4 # Frequency ratio between DRAM data bus and command bus (default = 2 times, i.e. DDR)
-gpgpu_dram_timing_opt nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40: CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2 # DRAM timing parameters = {nbk:tCCD:tRRD:tRCD:tRAS:tRP:tRC:CL:WL:tCDLR:tWR:nbkgrp:tCCDL:tRTPL}
-rop_latency                          120 # ROP queue latency (default 85)
-dram_latency                         100 # DRAM latency (default 30)
-gpgpu_mem_addr_mapping dramid@8;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS # mapping memory address to dram model {dramid@<start bit>;<memory address map>}
-gpgpu_mem_addr_test                    0 # run sweep test to check address mapping for aliased address
-gpgpu_mem_address_mask                    1 # 0 = old addressing mask, 1 = new addressing mask, 2 = new add. mask + flipped bank sel and chip sel bits
-gpuwattch_xml_file  gpuwattch_gtx480.xml # GPUWattch XML file
-power_simulation_enabled                    1 # Turn on power simulator (1=On, 0=Off)
-power_per_cycle_dump                    0 # Dump detailed power output each cycle
-power_trace_enabled                    0 # produce a file for the power trace (1=On, 0=Off)
-power_trace_zlevel                     6 # Compression level of the power trace output log (0=no comp, 9=highest)
-steady_power_levels_enabled                    0 # produce a file for the steady power levels (1=On, 0=Off)
-steady_state_definition                  8:4 # allowed deviation:number of samples
-gpgpu_max_cycle                        0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_insn                         0 # terminates gpu simulation early (0 = no limit)
-gpgpu_max_cta                          0 # terminates gpu simulation early (0 = no limit)
-gpgpu_runtime_stat                   500 # display runtime statistics such as dram utilization {<freq>:<flag>}
-liveness_message_freq                    1 # Minimum number of seconds between simulation liveness messages (0 = always print)
-gpgpu_flush_l1_cache                    0 # Flush L1 cache at the end of each kernel call
-gpgpu_flush_l2_cache                    0 # Flush L2 cache at the end of each kernel call
-gpgpu_deadlock_detect                    1 # Stop the simulation at deadlock (1=on (default), 0=off)
-gpgpu_ptx_instruction_classification                    0 # if enabled will classify ptx instruction types per kernel (Max 255 kernels now)
-gpgpu_ptx_sim_mode                     0 # Select between Performance (default) or Functional simulation (1)
-gpgpu_clock_domains 700.0:700.0:700.0:924.0 # Clock Domain Frequencies in MhZ {<Core Clock>:<ICNT Clock>:<L2 Clock>:<DRAM Clock>}
-gpgpu_max_concurrent_kernel                    8 # maximum kernels that can run concurrently on GPU
-gpgpu_cflog_interval                    0 # Interval between each snapshot in control flow logger
-visualizer_enabled                     0 # Turn on visualizer output (1=On, 0=Off)
-visualizer_outputfile                 NULL # Specifies the output log file for visualizer
-visualizer_zlevel                      6 # Compression level of the visualizer output log (0=no comp, 9=highest)
-trace_enabled                          0 # Turn on traces
-trace_components                    none # comma seperated list of traces to enable. Complete list found in trace_streams.tup. Default none
-trace_sampling_core                    0 # The core which is printed using CORE_DPRINTF. Default 0
-trace_sampling_memory_partition                   -1 # The memory partition which is printed using MEMPART_DPRINTF. Default -1 (i.e. all)
-enable_ptx_file_line_stats                    1 # Turn on PTX source line statistic profiling. (1 = On)
-ptx_line_stats_filename gpgpu_inst_stats.txt # Output file for PTX source line statistics.
-save_embedded_ptx                      0 # saves ptx files embedded in binary as <n>.ptx
-keep                                   0 # keep intermediate files created by GPGPU-Sim when interfacing with external programs
-gpgpu_ptx_save_converted_ptxplus                    0 # Saved converted ptxplus to a file
-ptx_opcode_latency_int         4,13,4,5,145 # Opcode latencies for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,19,25,145
-ptx_opcode_latency_fp          4,13,4,5,39 # Opcode latencies for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,30
-ptx_opcode_latency_dp         8,19,8,8,330 # Opcode latencies for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,335
-ptx_opcode_initiation_int            1,2,2,1,8 # Opcode initiation intervals for integers <ADD,MAX,MUL,MAD,DIV>Default 1,1,4,4,32
-ptx_opcode_initiation_fp            1,2,1,1,4 # Opcode initiation intervals for single precision floating points <ADD,MAX,MUL,MAD,DIV>Default 1,1,1,1,5
-ptx_opcode_initiation_dp         8,16,8,8,130 # Opcode initiation intervals for double precision floating points <ADD,MAX,MUL,MAD,DIV>Default 8,8,8,8,130
DRAM Timing Options:
nbk                                    16 # number of banks
CCD                                     2 # column to column delay
RRD                                     6 # minimal delay between activation of rows in different banks
RCD                                    12 # row to column delay
RAS                                    28 # time needed to activate row
RP                                     12 # time needed to precharge (deactivate) row
RC                                     40 # row cycle time
CDLR                                    5 # switching from write to read (changes tWTR)
WR                                     12 # last data-in to row precharge
CL                                     12 # CAS latency
WL                                      4 # Write latency
nbkgrp                                  4 # number of bank groups
CCDL                                    3 # column to column delay between accesses to different bank groups
RTPL                                    2 # read to precharge delay between accesses to different bank groups
Total number of memory sub partition = 12
addr_dec_mask[CHIP]  = 0000000000000000 	high:64 low:0
addr_dec_mask[BK]    = 000000000000e100 	high:16 low:8
addr_dec_mask[ROW]   = 000000000fff0000 	high:28 low:16
addr_dec_mask[COL]   = 0000000000001eff 	high:13 low:0
addr_dec_mask[BURST] = 000000000000003f 	high:6 low:0
sub_partition_id_mask = 0000000000000100
GPGPU-Sim uArch: clock freqs: 700000000.000000:700000000.000000:700000000.000000:924000000.000000
GPGPU-Sim uArch: clock periods: 0.00000000142857142857:0.00000000142857142857:0.00000000142857142857:0.00000000108225108225
*** Initializing Memory Statistics ***
GPGPU-Sim uArch: interconnect node map (shaderID+MemID to icntID)
GPGPU-Sim uArch: Memory nodes ID start from index: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
GPGPU-Sim uArch: interconnect node reverse map (icntID to shaderID+MemID)
GPGPU-Sim uArch: Memory nodes start from ID: 15
GPGPU-Sim uArch:    0   1   2   3   4
GPGPU-Sim uArch:    5   6   7   8   9
GPGPU-Sim uArch:   10  11  12  13  14
GPGPU-Sim uArch:   15  16  17  18  19
GPGPU-Sim uArch:   20  21  22  23  24
GPGPU-Sim uArch:   25  26
8b51d2418a0658287a30fe3c4cc1fd21  /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
GPGPU-Sim uArch: performance model initialization complete.
GPGPU-Sim PTX: __cudaRegisterFatBinary, fat_cubin_handle = 1, filename=mm.cu
self exe links to: /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM
Running md5sum using "md5sum /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM "
Running cuobjdump using "$CUDA_INSTALL_PATH/bin/cuobjdump -ptx -elf -sass /home/ly/下载/test/gpgpu-sim_distribution-master/ispass2009-benchmarks-master_2/bin/release/MM > _cuobjdump_complete_output_TX1yFU"
Parsing file _cuobjdump_complete_output_TX1yFU
######### cuobjdump parser ########
## Adding new section ELF
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_1.ptx
Adding arch: sm_10
Adding identifier: mm.cu
## Adding new section ELF
Adding arch: sm_20
Adding identifier: mm.cu
## Adding new section PTX
Adding ptx filename: _cuobjdump_2.ptx
Adding arch: sm_20
Adding identifier: mm.cu
Done parsing!!!
GPGPU-Sim PTX: __cudaRegisterFunction _Z14matrix_mul_gpuPiS_S_i : hostFun 0x0x400ce0, fat_cubin_handle = 1
GPGPU-Sim PTX: instruction assembly for function '_Z14matrix_mul_gpuPiS_S_i'...   done.
GPGPU-Sim PTX: finding reconvergence points for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate dominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: Finding immediate postdominators for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'...
GPGPU-Sim PTX: reconvergence points for _Z14matrix_mul_gpuPiS_S_i...
GPGPU-Sim PTX:  1 (potential) branch divergence @  PC=0x048 (_1.ptx:71) @%p1 bra $Lt_0_2306;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX:  2 (potential) branch divergence @  PC=0x130 (_1.ptx:103) @%p2 bra $Lt_0_1794;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:  3 (potential) branch divergence @  PC=0x138 (_1.ptx:104) bra.uni $Lt_0_1282;
GPGPU-Sim PTX:    immediate post dominator      @  PC=0x170 (_1.ptx:114) ld.param.u64 %rd11, [__cudaparm__Z14matrix_mul_gpuPiS_S_i_P];
GPGPU-Sim PTX: ... end of reconvergence points for _Z14matrix_mul_gpuPiS_S_i
GPGPU-Sim PTX: ... done pre-decoding instructions for '_Z14matrix_mul_gpuPiS_S_i'.
GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file _1.ptx
Adding _cuobjdump_2.ptx with cubin handle 1
GPGPU-Sim PTX: extracting embedded .ptx to temporary file "_ptx_jRGAGx"
Running: cat _ptx_jRGAGx | sed 's/.version 1.5/.version 1.4/' | sed 's/, texmode_independent//' | sed 's/\(\.extern \.const\[1\] .b8 \w\+\)\[\]/\1\[1\]/' | sed 's/const\[.\]/const\[0\]/g' > _ptx2_NU5CHa
GPGPU-Sim PTX: generating ptxinfo using "$CUDA_INSTALL_PATH/bin/ptxas --gpu-name=sm_20 -v _ptx2_NU5CHa --output-file  /dev/null 2> _ptx_jRGAGxinfo"
GPGPU-Sim PTX: Kernel '_Z14matrix_mul_gpuPiS_S_i' : regs=14, lmem=0, smem=0, cmem=60
GPGPU-Sim PTX: removing ptxinfo using "rm -f _ptx_jRGAGx _ptx2_NU5CHa _ptx_jRGAGxinfo"
GPGPU-Sim PTX: loading globals with explicit initializers... 
GPGPU-Sim PTX: finished loading globals (0 bytes total).
GPGPU-Sim PTX: loading constants with explicit initializers...  done.
Block(10,10)   Grid(15,15).

GPGPU-Sim PTX: cudaLaunch for 0x0x400ce0 (mode=performance simulation) on stream 0
GPGPU-Sim PTX: pushing kernel '_Z14matrix_mul_gpuPiS_S_i' to stream 0, gridDim= (15,15,1) blockDim = (10,10,1) 
kernel '_Z14matrix_mul_gpuPiS_S_i' transfer to GPU hardware scheduler
GPGPU-Sim uArch: Shader 8 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: CTA/core = 8, limited by: cta_limit
GPGPU-Sim uArch: core:  8, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 16 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 16, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 24 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 24, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 32 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 32, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 40 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 40, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 48 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 48, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 56 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 56, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 64 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 64, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 72 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 72, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 80 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 80, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 88 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 88, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 96 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 96, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 104 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:104, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 112 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:112, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 0 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  0, cta: 0 initialized @(1,0)
GPGPU-Sim uArch: Shader 9 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  9, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 17 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 17, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 25 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 25, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 33 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 33, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 41 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 41, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 49 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 49, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 57 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 57, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 65 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 65, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 73 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 73, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 81 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 81, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 89 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 89, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 97 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 97, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 105 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:105, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 113 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:113, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 1 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  1, cta: 0 initialized @(2,0)
GPGPU-Sim uArch: Shader 10 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 10, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 18 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 18, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 26 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 26, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 34 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 34, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 42 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 42, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 50 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 50, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 58 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 58, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 66 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 66, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 74 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 74, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 82 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 82, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 90 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 90, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 98 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 98, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 106 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:106, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 114 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:114, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 2 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  2, cta: 0 initialized @(3,0)
GPGPU-Sim uArch: Shader 11 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 11, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 19 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 19, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 27 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 27, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 35 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 35, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 43 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 43, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 51 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 51, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 59 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 59, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 67 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 67, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 75 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 75, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 83 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 83, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 91 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 91, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 99 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 99, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 107 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:107, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 115 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:115, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 3 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  3, cta: 0 initialized @(4,0)
GPGPU-Sim uArch: Shader 12 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 12, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 20 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 20, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 28 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 28, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 36 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 36, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 44 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 44, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 52 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 52, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 60 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 60, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 68 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 68, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 76 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 76, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 84 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 84, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 92 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 92, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 100 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:100, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 108 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:108, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 116 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:116, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 4 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  4, cta: 0 initialized @(5,0)
GPGPU-Sim uArch: Shader 13 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 13, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 21 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 21, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 29 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 29, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 37 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 37, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 45 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 45, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 53 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 53, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 61 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 61, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 69 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 69, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 77 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 77, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 85 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 85, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 93 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 93, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 101 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:101, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 109 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:109, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 117 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:117, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 5 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  5, cta: 0 initialized @(6,0)
GPGPU-Sim uArch: Shader 14 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 14, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 22 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 22, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 30 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 30, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 38 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 38, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 46 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 46, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 54 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 54, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 62 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 62, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 70 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 70, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 78 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 78, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 86 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 86, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 94 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 94, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 102 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:102, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 110 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:110, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 118 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:118, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 6 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  6, cta: 0 initialized @(7,0)
GPGPU-Sim uArch: Shader 15 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 15, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 23 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 23, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 31 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 31, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 39 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 39, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 47 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 47, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 55 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 55, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 63 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 63, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 71 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 71, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 79 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 79, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 87 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 87, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 95 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core: 95, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 103 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:103, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 111 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:111, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 119 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:119, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: Shader 7 bind to kernel 1 '_Z14matrix_mul_gpuPiS_S_i'
GPGPU-Sim uArch: core:  7, cta: 0 initialized @(8,0)
GPGPU-Sim uArch: core:  8, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 16, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 24, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 32, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 40, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 48, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 56, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 64, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 72, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 80, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 88, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core: 96, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:104, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:112, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  0, cta: 1 initialized @(9,0)
GPGPU-Sim uArch: core:  9, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 17, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 25, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 33, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 41, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 49, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 57, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 65, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 73, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 81, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 89, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 97, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:105, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:113, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core:  1, cta: 1 initialized @(10,0)
GPGPU-Sim uArch: core: 10, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 18, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 26, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 34, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 42, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 50, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 58, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 66, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 74, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 82, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 90, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 98, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:106, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:114, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core:  2, cta: 1 initialized @(11,0)
GPGPU-Sim uArch: core: 11, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 19, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 27, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 35, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 43, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 51, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 59, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 67, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 75, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 83, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 91, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 99, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:107, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:115, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core:  3, cta: 1 initialized @(12,0)
GPGPU-Sim uArch: core: 12, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 20, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 28, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 36, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 44, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 52, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 60, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 68, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 76, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 84, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 92, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:100, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:108, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:116, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core:  4, cta: 1 initialized @(13,0)
GPGPU-Sim uArch: core: 13, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 21, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 29, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 37, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 45, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 53, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 61, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 69, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 77, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 85, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 93, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:101, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:109, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:117, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core:  5, cta: 1 initialized @(14,0)
GPGPU-Sim uArch: core: 14, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 22, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 30, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 38, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 46, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 54, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 62, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 70, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 78, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 86, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core: 94, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:102, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:110, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:118, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: core:  6, cta: 1 initialized @(15,0)
GPGPU-Sim uArch: cycles simulated: 500  inst.: 49456 (ipc=98.9) sim_rate=49456 (inst/sec) elapsed = 0:0:00:01 / Mon Jun 14 16:18:46 2021
GPGPU-Sim PTX: 100000 instructions simulated : ctaid=(2,12,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 1000  inst.: 155464 (ipc=155.5) sim_rate=77732 (inst/sec) elapsed = 0:0:00:02 / Mon Jun 14 16:18:47 2021
GPGPU-Sim PTX: 200000 instructions simulated : ctaid=(1,9,0) tid=(7,2,0)
GPGPU-Sim PTX: 300000 instructions simulated : ctaid=(13,12,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 1500  inst.: 294800 (ipc=196.5) sim_rate=98266 (inst/sec) elapsed = 0:0:00:03 / Mon Jun 14 16:18:48 2021
GPGPU-Sim PTX: 400000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 500000 instructions simulated : ctaid=(3,11,0) tid=(7,4,0)
GPGPU-Sim PTX: 600000 instructions simulated : ctaid=(0,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 700000 instructions simulated : ctaid=(3,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 2500  inst.: 658596 (ipc=263.4) sim_rate=131719 (inst/sec) elapsed = 0:0:00:05 / Mon Jun 14 16:18:50 2021
GPGPU-Sim uArch: cycles simulated: 3000  inst.: 686456 (ipc=228.8) sim_rate=114409 (inst/sec) elapsed = 0:0:00:06 / Mon Jun 14 16:18:51 2021
GPGPU-Sim PTX: 800000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 4000  inst.: 767180 (ipc=191.8) sim_rate=109597 (inst/sec) elapsed = 0:0:00:07 / Mon Jun 14 16:18:52 2021
GPGPU-Sim uArch: cycles simulated: 4500  inst.: 852268 (ipc=189.4) sim_rate=106533 (inst/sec) elapsed = 0:0:00:08 / Mon Jun 14 16:18:53 2021
GPGPU-Sim PTX: 900000 instructions simulated : ctaid=(4,10,0) tid=(9,1,0)
GPGPU-Sim PTX: 1000000 instructions simulated : ctaid=(9,0,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 5000  inst.: 1010580 (ipc=202.1) sim_rate=112286 (inst/sec) elapsed = 0:0:00:09 / Mon Jun 14 16:18:54 2021
GPGPU-Sim PTX: 1100000 instructions simulated : ctaid=(14,2,0) tid=(3,0,0)
GPGPU-Sim PTX: 1200000 instructions simulated : ctaid=(13,8,0) tid=(1,7,0)
GPGPU-Sim PTX: 1300000 instructions simulated : ctaid=(1,12,0) tid=(5,1,0)
GPGPU-Sim PTX: 1400000 instructions simulated : ctaid=(3,6,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 5500  inst.: 1387024 (ipc=252.2) sim_rate=138702 (inst/sec) elapsed = 0:0:00:10 / Mon Jun 14 16:18:55 2021
GPGPU-Sim PTX: 1500000 instructions simulated : ctaid=(10,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 1600000 instructions simulated : ctaid=(6,3,0) tid=(3,4,0)
GPGPU-Sim PTX: 1700000 instructions simulated : ctaid=(11,7,0) tid=(7,0,0)
GPGPU-Sim PTX: 1800000 instructions simulated : ctaid=(4,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 6000  inst.: 1834944 (ipc=305.8) sim_rate=152912 (inst/sec) elapsed = 0:0:00:12 / Mon Jun 14 16:18:57 2021
GPGPU-Sim PTX: 1900000 instructions simulated : ctaid=(10,0,0) tid=(5,3,0)
GPGPU-Sim PTX: 2000000 instructions simulated : ctaid=(12,3,0) tid=(9,5,0)
GPGPU-Sim PTX: 2100000 instructions simulated : ctaid=(3,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 2200000 instructions simulated : ctaid=(8,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 2300000 instructions simulated : ctaid=(12,7,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 6500  inst.: 2264888 (ipc=348.4) sim_rate=174222 (inst/sec) elapsed = 0:0:00:13 / Mon Jun 14 16:18:58 2021
GPGPU-Sim PTX: 2400000 instructions simulated : ctaid=(6,6,0) tid=(1,9,0)
GPGPU-Sim PTX: 2500000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 2600000 instructions simulated : ctaid=(4,12,0) tid=(3,4,0)
GPGPU-Sim PTX: 2700000 instructions simulated : ctaid=(12,1,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 7000  inst.: 2673128 (ipc=381.9) sim_rate=190937 (inst/sec) elapsed = 0:0:00:14 / Mon Jun 14 16:18:59 2021
GPGPU-Sim PTX: 2800000 instructions simulated : ctaid=(3,0,0) tid=(7,2,0)
GPGPU-Sim PTX: 2900000 instructions simulated : ctaid=(0,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 3000000 instructions simulated : ctaid=(5,4,0) tid=(1,3,0)
GPGPU-Sim uArch: cycles simulated: 7500  inst.: 3038248 (ipc=405.1) sim_rate=202549 (inst/sec) elapsed = 0:0:00:15 / Mon Jun 14 16:19:00 2021
GPGPU-Sim PTX: 3100000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3200000 instructions simulated : ctaid=(12,11,0) tid=(7,8,0)
GPGPU-Sim PTX: 3300000 instructions simulated : ctaid=(5,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 3400000 instructions simulated : ctaid=(4,12,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 8000  inst.: 3416336 (ipc=427.0) sim_rate=200960 (inst/sec) elapsed = 0:0:00:17 / Mon Jun 14 16:19:02 2021
GPGPU-Sim PTX: 3500000 instructions simulated : ctaid=(11,13,0) tid=(1,9,0)
GPGPU-Sim PTX: 3600000 instructions simulated : ctaid=(10,12,0) tid=(9,1,0)
GPGPU-Sim PTX: 3700000 instructions simulated : ctaid=(4,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 3800000 instructions simulated : ctaid=(13,11,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 8500  inst.: 3794960 (ipc=446.5) sim_rate=210831 (inst/sec) elapsed = 0:0:00:18 / Mon Jun 14 16:19:03 2021
GPGPU-Sim PTX: 3900000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 4000000 instructions simulated : ctaid=(2,8,0) tid=(1,3,0)
GPGPU-Sim PTX: 4100000 instructions simulated : ctaid=(8,14,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 9000  inst.: 4145552 (ipc=460.6) sim_rate=218186 (inst/sec) elapsed = 0:0:00:19 / Mon Jun 14 16:19:04 2021
GPGPU-Sim PTX: 4200000 instructions simulated : ctaid=(13,11,0) tid=(3,8,0)
GPGPU-Sim PTX: 4300000 instructions simulated : ctaid=(9,1,0) tid=(3,2,0)
GPGPU-Sim PTX: 4400000 instructions simulated : ctaid=(7,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 4500000 instructions simulated : ctaid=(4,6,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 9500  inst.: 4494464 (ipc=473.1) sim_rate=224723 (inst/sec) elapsed = 0:0:00:20 / Mon Jun 14 16:19:05 2021
GPGPU-Sim PTX: 4600000 instructions simulated : ctaid=(10,0,0) tid=(9,9,0)
GPGPU-Sim PTX: 4700000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 4800000 instructions simulated : ctaid=(11,7,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 10000  inst.: 4846372 (ipc=484.6) sim_rate=230779 (inst/sec) elapsed = 0:0:00:21 / Mon Jun 14 16:19:06 2021
GPGPU-Sim PTX: 4900000 instructions simulated : ctaid=(3,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 5000000 instructions simulated : ctaid=(10,14,0) tid=(5,9,0)
GPGPU-Sim PTX: 5100000 instructions simulated : ctaid=(8,10,0) tid=(1,3,0)
GPGPU-Sim PTX: 5200000 instructions simulated : ctaid=(11,1,0) tid=(1,9,0)
GPGPU-Sim uArch: cycles simulated: 10500  inst.: 5179948 (ipc=493.3) sim_rate=225215 (inst/sec) elapsed = 0:0:00:23 / Mon Jun 14 16:19:08 2021
GPGPU-Sim PTX: 5300000 instructions simulated : ctaid=(7,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 5400000 instructions simulated : ctaid=(9,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 5500000 instructions simulated : ctaid=(1,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 11000  inst.: 5542924 (ipc=503.9) sim_rate=230955 (inst/sec) elapsed = 0:0:00:24 / Mon Jun 14 16:19:09 2021
GPGPU-Sim PTX: 5600000 instructions simulated : ctaid=(3,8,0) tid=(7,8,0)
GPGPU-Sim PTX: 5700000 instructions simulated : ctaid=(8,0,0) tid=(3,8,0)
GPGPU-Sim PTX: 5800000 instructions simulated : ctaid=(2,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 5900000 instructions simulated : ctaid=(4,12,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 11500  inst.: 5863464 (ipc=509.9) sim_rate=234538 (inst/sec) elapsed = 0:0:00:25 / Mon Jun 14 16:19:10 2021
GPGPU-Sim PTX: 6000000 instructions simulated : ctaid=(10,4,0) tid=(7,4,0)
GPGPU-Sim PTX: 6100000 instructions simulated : ctaid=(7,9,0) tid=(5,7,0)
GPGPU-Sim PTX: 6200000 instructions simulated : ctaid=(9,7,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 12000  inst.: 6198952 (ipc=516.6) sim_rate=238421 (inst/sec) elapsed = 0:0:00:26 / Mon Jun 14 16:19:11 2021
GPGPU-Sim PTX: 6300000 instructions simulated : ctaid=(0,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 6400000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 6500000 instructions simulated : ctaid=(6,10,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 12500  inst.: 6514828 (ipc=521.2) sim_rate=241289 (inst/sec) elapsed = 0:0:00:27 / Mon Jun 14 16:19:12 2021
GPGPU-Sim PTX: 6600000 instructions simulated : ctaid=(9,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 6700000 instructions simulated : ctaid=(7,2,0) tid=(5,9,0)
GPGPU-Sim PTX: 6800000 instructions simulated : ctaid=(1,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 6900000 instructions simulated : ctaid=(9,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 13000  inst.: 6850760 (ipc=527.0) sim_rate=244670 (inst/sec) elapsed = 0:0:00:28 / Mon Jun 14 16:19:13 2021
GPGPU-Sim PTX: 7000000 instructions simulated : ctaid=(5,1,0) tid=(1,7,0)
GPGPU-Sim PTX: 7100000 instructions simulated : ctaid=(13,0,0) tid=(5,1,0)
GPGPU-Sim PTX: 7200000 instructions simulated : ctaid=(10,11,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 13500  inst.: 7177796 (ipc=531.7) sim_rate=247510 (inst/sec) elapsed = 0:0:00:29 / Mon Jun 14 16:19:14 2021
GPGPU-Sim PTX: 7300000 instructions simulated : ctaid=(3,5,0) tid=(3,4,0)
GPGPU-Sim PTX: 7400000 instructions simulated : ctaid=(1,12,0) tid=(3,0,0)
GPGPU-Sim PTX: 7500000 instructions simulated : ctaid=(2,12,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 14000  inst.: 7513232 (ipc=536.7) sim_rate=242362 (inst/sec) elapsed = 0:0:00:31 / Mon Jun 14 16:19:16 2021
GPGPU-Sim PTX: 7600000 instructions simulated : ctaid=(12,4,0) tid=(3,8,0)
GPGPU-Sim PTX: 7700000 instructions simulated : ctaid=(5,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 7800000 instructions simulated : ctaid=(10,0,0) tid=(7,4,0)
GPGPU-Sim PTX: 7900000 instructions simulated : ctaid=(11,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 14500  inst.: 7861928 (ipc=542.2) sim_rate=245685 (inst/sec) elapsed = 0:0:00:32 / Mon Jun 14 16:19:17 2021
GPGPU-Sim PTX: 8000000 instructions simulated : ctaid=(13,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 8100000 instructions simulated : ctaid=(8,7,0) tid=(1,3,0)
GPGPU-Sim PTX: 8200000 instructions simulated : ctaid=(9,13,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 15000  inst.: 8177372 (ipc=545.2) sim_rate=247799 (inst/sec) elapsed = 0:0:00:33 / Mon Jun 14 16:19:18 2021
GPGPU-Sim PTX: 8300000 instructions simulated : ctaid=(4,4,0) tid=(3,0,0)
GPGPU-Sim PTX: 8400000 instructions simulated : ctaid=(10,10,0) tid=(1,7,0)
GPGPU-Sim PTX: 8500000 instructions simulated : ctaid=(0,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 15500  inst.: 8534132 (ipc=550.6) sim_rate=251003 (inst/sec) elapsed = 0:0:00:34 / Mon Jun 14 16:19:19 2021
GPGPU-Sim PTX: 8600000 instructions simulated : ctaid=(12,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 8700000 instructions simulated : ctaid=(4,3,0) tid=(1,3,0)
GPGPU-Sim PTX: 8800000 instructions simulated : ctaid=(9,14,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 16000  inst.: 8846628 (ipc=552.9) sim_rate=252760 (inst/sec) elapsed = 0:0:00:35 / Mon Jun 14 16:19:20 2021
GPGPU-Sim PTX: 8900000 instructions simulated : ctaid=(0,1,0) tid=(7,4,0)
GPGPU-Sim PTX: 9000000 instructions simulated : ctaid=(10,11,0) tid=(1,7,0)
GPGPU-Sim PTX: 9100000 instructions simulated : ctaid=(12,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 9200000 instructions simulated : ctaid=(10,7,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 16500  inst.: 9193752 (ipc=557.2) sim_rate=255382 (inst/sec) elapsed = 0:0:00:36 / Mon Jun 14 16:19:21 2021
GPGPU-Sim PTX: 9300000 instructions simulated : ctaid=(10,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 9400000 instructions simulated : ctaid=(2,12,0) tid=(1,9,0)
GPGPU-Sim PTX: 9500000 instructions simulated : ctaid=(3,9,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 17000  inst.: 9519480 (ipc=560.0) sim_rate=250512 (inst/sec) elapsed = 0:0:00:38 / Mon Jun 14 16:19:23 2021
GPGPU-Sim PTX: 9600000 instructions simulated : ctaid=(12,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 9700000 instructions simulated : ctaid=(1,3,0) tid=(7,0,0)
GPGPU-Sim PTX: 9800000 instructions simulated : ctaid=(7,0,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 17500  inst.: 9845216 (ipc=562.6) sim_rate=252441 (inst/sec) elapsed = 0:0:00:39 / Mon Jun 14 16:19:24 2021
GPGPU-Sim PTX: 9900000 instructions simulated : ctaid=(1,6,0) tid=(3,2,0)
GPGPU-Sim PTX: 10000000 instructions simulated : ctaid=(10,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 10100000 instructions simulated : ctaid=(10,10,0) tid=(1,5,0)
GPGPU-Sim PTX: 10200000 instructions simulated : ctaid=(2,4,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 18000  inst.: 10175904 (ipc=565.3) sim_rate=254397 (inst/sec) elapsed = 0:0:00:40 / Mon Jun 14 16:19:25 2021
GPGPU-Sim PTX: 10300000 instructions simulated : ctaid=(10,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 10400000 instructions simulated : ctaid=(8,8,0) tid=(9,9,0)
GPGPU-Sim PTX: 10500000 instructions simulated : ctaid=(13,8,0) tid=(3,0,0)
GPGPU-Sim uArch: cycles simulated: 18500  inst.: 10526504 (ipc=569.0) sim_rate=256744 (inst/sec) elapsed = 0:0:00:41 / Mon Jun 14 16:19:26 2021
GPGPU-Sim PTX: 10600000 instructions simulated : ctaid=(13,12,0) tid=(5,7,0)
GPGPU-Sim PTX: 10700000 instructions simulated : ctaid=(14,2,0) tid=(7,4,0)
GPGPU-Sim PTX: 10800000 instructions simulated : ctaid=(9,1,0) tid=(3,6,0)
GPGPU-Sim PTX: 10900000 instructions simulated : ctaid=(14,6,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 19000  inst.: 10861260 (ipc=571.6) sim_rate=258601 (inst/sec) elapsed = 0:0:00:42 / Mon Jun 14 16:19:27 2021
GPGPU-Sim PTX: 11000000 instructions simulated : ctaid=(3,13,0) tid=(1,3,0)
GPGPU-Sim PTX: 11100000 instructions simulated : ctaid=(13,11,0) tid=(7,6,0)
GPGPU-Sim PTX: 11200000 instructions simulated : ctaid=(11,6,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 19500  inst.: 11182232 (ipc=573.4) sim_rate=254141 (inst/sec) elapsed = 0:0:00:44 / Mon Jun 14 16:19:29 2021
GPGPU-Sim PTX: 11300000 instructions simulated : ctaid=(7,7,0) tid=(5,9,0)
GPGPU-Sim PTX: 11400000 instructions simulated : ctaid=(9,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 11500000 instructions simulated : ctaid=(14,10,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 20000  inst.: 11510844 (ipc=575.5) sim_rate=255796 (inst/sec) elapsed = 0:0:00:45 / Mon Jun 14 16:19:30 2021
GPGPU-Sim PTX: 11600000 instructions simulated : ctaid=(2,13,0) tid=(3,0,0)
GPGPU-Sim PTX: 11700000 instructions simulated : ctaid=(10,10,0) tid=(5,3,0)
GPGPU-Sim PTX: 11800000 instructions simulated : ctaid=(1,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 20500  inst.: 11845228 (ipc=577.8) sim_rate=257504 (inst/sec) elapsed = 0:0:00:46 / Mon Jun 14 16:19:31 2021
GPGPU-Sim PTX: 11900000 instructions simulated : ctaid=(12,9,0) tid=(5,1,0)
GPGPU-Sim PTX: 12000000 instructions simulated : ctaid=(9,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 12100000 instructions simulated : ctaid=(10,1,0) tid=(7,6,0)
GPGPU-Sim PTX: 12200000 instructions simulated : ctaid=(12,2,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 21000  inst.: 12183192 (ipc=580.2) sim_rate=259216 (inst/sec) elapsed = 0:0:00:47 / Mon Jun 14 16:19:32 2021
GPGPU-Sim PTX: 12300000 instructions simulated : ctaid=(7,7,0) tid=(7,6,0)
GPGPU-Sim PTX: 12400000 instructions simulated : ctaid=(9,1,0) tid=(3,0,0)
GPGPU-Sim PTX: 12500000 instructions simulated : ctaid=(14,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 21500  inst.: 12511912 (ipc=581.9) sim_rate=260664 (inst/sec) elapsed = 0:0:00:48 / Mon Jun 14 16:19:33 2021
GPGPU-Sim PTX: 12600000 instructions simulated : ctaid=(13,7,0) tid=(9,9,0)
GPGPU-Sim PTX: 12700000 instructions simulated : ctaid=(0,13,0) tid=(5,5,0)
GPGPU-Sim PTX: 12800000 instructions simulated : ctaid=(5,9,0) tid=(1,9,0)
GPGPU-Sim PTX: 12900000 instructions simulated : ctaid=(5,13,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 22000  inst.: 12861064 (ipc=584.6) sim_rate=262470 (inst/sec) elapsed = 0:0:00:49 / Mon Jun 14 16:19:34 2021
GPGPU-Sim PTX: 13000000 instructions simulated : ctaid=(14,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 13100000 instructions simulated : ctaid=(1,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 13200000 instructions simulated : ctaid=(11,12,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 22500  inst.: 13200500 (ipc=586.7) sim_rate=258833 (inst/sec) elapsed = 0:0:00:51 / Mon Jun 14 16:19:36 2021
GPGPU-Sim PTX: 13300000 instructions simulated : ctaid=(11,1,0) tid=(7,0,0)
GPGPU-Sim PTX: 13400000 instructions simulated : ctaid=(11,4,0) tid=(5,9,0)
GPGPU-Sim PTX: 13500000 instructions simulated : ctaid=(3,13,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 23000  inst.: 13520992 (ipc=587.9) sim_rate=260019 (inst/sec) elapsed = 0:0:00:52 / Mon Jun 14 16:19:37 2021
GPGPU-Sim PTX: 13600000 instructions simulated : ctaid=(5,12,0) tid=(9,5,0)
GPGPU-Sim PTX: 13700000 instructions simulated : ctaid=(3,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 13800000 instructions simulated : ctaid=(1,2,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 23500  inst.: 13840084 (ipc=588.9) sim_rate=261133 (inst/sec) elapsed = 0:0:00:53 / Mon Jun 14 16:19:38 2021
GPGPU-Sim PTX: 13900000 instructions simulated : ctaid=(11,14,0) tid=(1,9,0)
GPGPU-Sim PTX: 14000000 instructions simulated : ctaid=(7,6,0) tid=(5,1,0)
GPGPU-Sim PTX: 14100000 instructions simulated : ctaid=(9,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 14200000 instructions simulated : ctaid=(14,8,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 24000  inst.: 14188812 (ipc=591.2) sim_rate=262755 (inst/sec) elapsed = 0:0:00:54 / Mon Jun 14 16:19:39 2021
GPGPU-Sim PTX: 14300000 instructions simulated : ctaid=(10,11,0) tid=(7,0,0)
GPGPU-Sim PTX: 14400000 instructions simulated : ctaid=(13,13,0) tid=(3,2,0)
GPGPU-Sim PTX: 14500000 instructions simulated : ctaid=(3,10,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 24500  inst.: 14517376 (ipc=592.5) sim_rate=263952 (inst/sec) elapsed = 0:0:00:55 / Mon Jun 14 16:19:40 2021
GPGPU-Sim PTX: 14600000 instructions simulated : ctaid=(12,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 14700000 instructions simulated : ctaid=(7,7,0) tid=(5,7,0)
GPGPU-Sim PTX: 14800000 instructions simulated : ctaid=(13,8,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 25000  inst.: 14846428 (ipc=593.9) sim_rate=265114 (inst/sec) elapsed = 0:0:00:56 / Mon Jun 14 16:19:41 2021
GPGPU-Sim PTX: 14900000 instructions simulated : ctaid=(9,5,0) tid=(9,9,0)
GPGPU-Sim PTX: 15000000 instructions simulated : ctaid=(2,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 15100000 instructions simulated : ctaid=(8,13,0) tid=(9,5,0)
GPGPU-Sim PTX: 15200000 instructions simulated : ctaid=(4,12,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 25500  inst.: 15182476 (ipc=595.4) sim_rate=266359 (inst/sec) elapsed = 0:0:00:57 / Mon Jun 14 16:19:42 2021
GPGPU-Sim PTX: 15300000 instructions simulated : ctaid=(6,12,0) tid=(3,2,0)
GPGPU-Sim PTX: 15400000 instructions simulated : ctaid=(13,0,0) tid=(1,7,0)
GPGPU-Sim PTX: 15500000 instructions simulated : ctaid=(10,1,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 26000  inst.: 15506780 (ipc=596.4) sim_rate=262826 (inst/sec) elapsed = 0:0:00:59 / Mon Jun 14 16:19:44 2021
GPGPU-Sim PTX: 15600000 instructions simulated : ctaid=(2,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 15700000 instructions simulated : ctaid=(5,11,0) tid=(9,7,0)
GPGPU-Sim PTX: 15800000 instructions simulated : ctaid=(2,13,0) tid=(9,1,0)
GPGPU-Sim PTX: 15900000 instructions simulated : ctaid=(6,5,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 26500  inst.: 15853292 (ipc=598.2) sim_rate=264221 (inst/sec) elapsed = 0:0:01:00 / Mon Jun 14 16:19:45 2021
GPGPU-Sim PTX: 16000000 instructions simulated : ctaid=(13,2,0) tid=(7,2,0)
GPGPU-Sim PTX: 16100000 instructions simulated : ctaid=(14,0,0) tid=(1,9,0)
GPGPU-Sim PTX: 16200000 instructions simulated : ctaid=(12,10,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 27000  inst.: 16182996 (ipc=599.4) sim_rate=265295 (inst/sec) elapsed = 0:0:01:01 / Mon Jun 14 16:19:46 2021
GPGPU-Sim PTX: 16300000 instructions simulated : ctaid=(5,10,0) tid=(3,6,0)
GPGPU-Sim PTX: 16400000 instructions simulated : ctaid=(13,10,0) tid=(7,0,0)
GPGPU-Sim PTX: 16500000 instructions simulated : ctaid=(14,13,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 27500  inst.: 16529016 (ipc=601.1) sim_rate=266597 (inst/sec) elapsed = 0:0:01:02 / Mon Jun 14 16:19:47 2021
GPGPU-Sim PTX: 16600000 instructions simulated : ctaid=(11,10,0) tid=(3,8,0)
GPGPU-Sim PTX: 16700000 instructions simulated : ctaid=(12,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 16800000 instructions simulated : ctaid=(1,12,0) tid=(7,8,0)
GPGPU-Sim PTX: 16900000 instructions simulated : ctaid=(9,11,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 28000  inst.: 16859168 (ipc=602.1) sim_rate=267605 (inst/sec) elapsed = 0:0:01:03 / Mon Jun 14 16:19:48 2021
GPGPU-Sim PTX: 17000000 instructions simulated : ctaid=(5,5,0) tid=(5,5,0)
GPGPU-Sim PTX: 17100000 instructions simulated : ctaid=(7,11,0) tid=(1,9,0)
GPGPU-Sim PTX: 17200000 instructions simulated : ctaid=(6,7,0) tid=(1,1,0)
GPGPU-Sim uArch: cycles simulated: 28500  inst.: 17206480 (ipc=603.7) sim_rate=268851 (inst/sec) elapsed = 0:0:01:04 / Mon Jun 14 16:19:49 2021
GPGPU-Sim PTX: 17300000 instructions simulated : ctaid=(14,3,0) tid=(9,1,0)
GPGPU-Sim PTX: 17400000 instructions simulated : ctaid=(1,11,0) tid=(9,1,0)
GPGPU-Sim PTX: 17500000 instructions simulated : ctaid=(9,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 29000  inst.: 17530396 (ipc=604.5) sim_rate=265612 (inst/sec) elapsed = 0:0:01:06 / Mon Jun 14 16:19:51 2021
GPGPU-Sim PTX: 17600000 instructions simulated : ctaid=(8,5,0) tid=(9,7,0)
GPGPU-Sim PTX: 17700000 instructions simulated : ctaid=(9,2,0) tid=(1,3,0)
GPGPU-Sim PTX: 17800000 instructions simulated : ctaid=(11,11,0) tid=(5,9,0)
GPGPU-Sim PTX: 17900000 instructions simulated : ctaid=(14,8,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 29500  inst.: 17869380 (ipc=605.7) sim_rate=266707 (inst/sec) elapsed = 0:0:01:07 / Mon Jun 14 16:19:52 2021
GPGPU-Sim PTX: 18000000 instructions simulated : ctaid=(11,8,0) tid=(1,1,0)
GPGPU-Sim PTX: 18100000 instructions simulated : ctaid=(9,9,0) tid=(3,0,0)
GPGPU-Sim PTX: 18200000 instructions simulated : ctaid=(12,13,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 30000  inst.: 18174448 (ipc=605.8) sim_rate=267271 (inst/sec) elapsed = 0:0:01:08 / Mon Jun 14 16:19:53 2021
GPGPU-Sim PTX: 18300000 instructions simulated : ctaid=(4,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 18400000 instructions simulated : ctaid=(1,14,0) tid=(9,1,0)
GPGPU-Sim PTX: 18500000 instructions simulated : ctaid=(0,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 30500  inst.: 18511332 (ipc=606.9) sim_rate=268280 (inst/sec) elapsed = 0:0:01:09 / Mon Jun 14 16:19:54 2021
GPGPU-Sim PTX: 18600000 instructions simulated : ctaid=(11,5,0) tid=(7,6,0)
GPGPU-Sim PTX: 18700000 instructions simulated : ctaid=(5,8,0) tid=(7,4,0)
GPGPU-Sim PTX: 18800000 instructions simulated : ctaid=(3,14,0) tid=(9,3,0)
GPGPU-Sim PTX: 18900000 instructions simulated : ctaid=(1,10,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 31000  inst.: 18869068 (ipc=608.7) sim_rate=269558 (inst/sec) elapsed = 0:0:01:10 / Mon Jun 14 16:19:55 2021
GPGPU-Sim PTX: 19000000 instructions simulated : ctaid=(11,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 19100000 instructions simulated : ctaid=(1,4,0) tid=(7,8,0)
GPGPU-Sim PTX: 19200000 instructions simulated : ctaid=(4,9,0) tid=(9,3,0)
GPGPU-Sim uArch: cycles simulated: 31500  inst.: 19201680 (ipc=609.6) sim_rate=270446 (inst/sec) elapsed = 0:0:01:11 / Mon Jun 14 16:19:56 2021
GPGPU-Sim PTX: 19300000 instructions simulated : ctaid=(6,13,0) tid=(9,9,0)
GPGPU-Sim PTX: 19400000 instructions simulated : ctaid=(11,0,0) tid=(9,1,0)
GPGPU-Sim PTX: 19500000 instructions simulated : ctaid=(9,2,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 32000  inst.: 19534468 (ipc=610.5) sim_rate=271312 (inst/sec) elapsed = 0:0:01:12 / Mon Jun 14 16:19:57 2021
GPGPU-Sim PTX: 19600000 instructions simulated : ctaid=(8,2,0) tid=(1,5,0)
GPGPU-Sim PTX: 19700000 instructions simulated : ctaid=(11,14,0) tid=(7,6,0)
GPGPU-Sim PTX: 19800000 instructions simulated : ctaid=(12,2,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 32500  inst.: 19828596 (ipc=610.1) sim_rate=271624 (inst/sec) elapsed = 0:0:01:13 / Mon Jun 14 16:19:58 2021
GPGPU-Sim PTX: 19900000 instructions simulated : ctaid=(4,3,0) tid=(3,6,0)
GPGPU-Sim PTX: 20000000 instructions simulated : ctaid=(11,5,0) tid=(9,3,0)
GPGPU-Sim PTX: 20100000 instructions simulated : ctaid=(1,1,0) tid=(1,3,0)
GPGPU-Sim PTX: 20200000 instructions simulated : ctaid=(4,2,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33000  inst.: 20172780 (ipc=611.3) sim_rate=268970 (inst/sec) elapsed = 0:0:01:15 / Mon Jun 14 16:20:00 2021
GPGPU-Sim PTX: 20300000 instructions simulated : ctaid=(9,5,0) tid=(1,1,0)
GPGPU-Sim PTX: 20400000 instructions simulated : ctaid=(10,10,0) tid=(9,5,0)
GPGPU-Sim PTX: 20500000 instructions simulated : ctaid=(9,4,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 33500  inst.: 20511528 (ipc=612.3) sim_rate=269888 (inst/sec) elapsed = 0:0:01:16 / Mon Jun 14 16:20:01 2021
GPGPU-Sim PTX: 20600000 instructions simulated : ctaid=(6,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 20700000 instructions simulated : ctaid=(12,12,0) tid=(7,2,0)
GPGPU-Sim PTX: 20800000 instructions simulated : ctaid=(8,4,0) tid=(9,7,0)
GPGPU-Sim PTX: 20900000 instructions simulated : ctaid=(8,6,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 34000  inst.: 20866696 (ipc=613.7) sim_rate=270996 (inst/sec) elapsed = 0:0:01:17 / Mon Jun 14 16:20:02 2021
GPGPU-Sim PTX: 21000000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 21100000 instructions simulated : ctaid=(7,2,0) tid=(9,1,0)
GPGPU-Sim PTX: 21200000 instructions simulated : ctaid=(1,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 34500  inst.: 21196192 (ipc=614.4) sim_rate=271746 (inst/sec) elapsed = 0:0:01:18 / Mon Jun 14 16:20:03 2021
GPGPU-Sim PTX: 21300000 instructions simulated : ctaid=(6,6,0) tid=(5,9,0)
GPGPU-Sim PTX: 21400000 instructions simulated : ctaid=(4,2,0) tid=(9,5,0)
GPGPU-Sim PTX: 21500000 instructions simulated : ctaid=(2,5,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 35000  inst.: 21528812 (ipc=615.1) sim_rate=272516 (inst/sec) elapsed = 0:0:01:19 / Mon Jun 14 16:20:04 2021
GPGPU-Sim PTX: 21600000 instructions simulated : ctaid=(12,14,0) tid=(1,3,0)
GPGPU-Sim PTX: 21700000 instructions simulated : ctaid=(11,9,0) tid=(9,7,0)
GPGPU-Sim PTX: 21800000 instructions simulated : ctaid=(5,14,0) tid=(3,0,0)
GPGPU-Sim PTX: 21900000 instructions simulated : ctaid=(12,3,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 35500  inst.: 21850644 (ipc=615.5) sim_rate=273133 (inst/sec) elapsed = 0:0:01:20 / Mon Jun 14 16:20:05 2021
GPGPU-Sim PTX: 22000000 instructions simulated : ctaid=(5,2,0) tid=(9,9,0)
GPGPU-Sim PTX: 22100000 instructions simulated : ctaid=(6,0,0) tid=(5,9,0)
GPGPU-Sim PTX: 22200000 instructions simulated : ctaid=(11,9,0) tid=(7,4,0)
GPGPU-Sim uArch: cycles simulated: 36000  inst.: 22180956 (ipc=616.1) sim_rate=273838 (inst/sec) elapsed = 0:0:01:21 / Mon Jun 14 16:20:06 2021
GPGPU-Sim PTX: 22300000 instructions simulated : ctaid=(2,1,0) tid=(5,3,0)
GPGPU-Sim PTX: 22400000 instructions simulated : ctaid=(1,5,0) tid=(5,3,0)
GPGPU-Sim PTX: 22500000 instructions simulated : ctaid=(2,4,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 36500  inst.: 22532600 (ipc=617.3) sim_rate=271477 (inst/sec) elapsed = 0:0:01:23 / Mon Jun 14 16:20:08 2021
GPGPU-Sim PTX: 22600000 instructions simulated : ctaid=(14,9,0) tid=(1,7,0)
GPGPU-Sim PTX: 22700000 instructions simulated : ctaid=(2,12,0) tid=(1,7,0)
GPGPU-Sim PTX: 22800000 instructions simulated : ctaid=(5,14,0) tid=(9,5,0)
GPGPU-Sim PTX: 22900000 instructions simulated : ctaid=(4,0,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 37000  inst.: 22872968 (ipc=618.2) sim_rate=272297 (inst/sec) elapsed = 0:0:01:24 / Mon Jun 14 16:20:09 2021
GPGPU-Sim PTX: 23000000 instructions simulated : ctaid=(13,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 23100000 instructions simulated : ctaid=(6,3,0) tid=(3,0,0)
GPGPU-Sim PTX: 23200000 instructions simulated : ctaid=(3,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 37500  inst.: 23195320 (ipc=618.5) sim_rate=272886 (inst/sec) elapsed = 0:0:01:25 / Mon Jun 14 16:20:10 2021
GPGPU-Sim PTX: 23300000 instructions simulated : ctaid=(8,6,0) tid=(9,1,0)
GPGPU-Sim PTX: 23400000 instructions simulated : ctaid=(1,0,0) tid=(9,7,0)
GPGPU-Sim PTX: 23500000 instructions simulated : ctaid=(0,10,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 38000  inst.: 23505996 (ipc=618.6) sim_rate=273325 (inst/sec) elapsed = 0:0:01:26 / Mon Jun 14 16:20:11 2021
GPGPU-Sim PTX: 23600000 instructions simulated : ctaid=(5,11,0) tid=(1,1,0)
GPGPU-Sim PTX: 23700000 instructions simulated : ctaid=(13,0,0) tid=(9,3,0)
GPGPU-Sim PTX: 23800000 instructions simulated : ctaid=(1,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 38500  inst.: 23837924 (ipc=619.2) sim_rate=273999 (inst/sec) elapsed = 0:0:01:27 / Mon Jun 14 16:20:12 2021
GPGPU-Sim PTX: 23900000 instructions simulated : ctaid=(4,3,0) tid=(5,1,0)
GPGPU-Sim PTX: 24000000 instructions simulated : ctaid=(13,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24100000 instructions simulated : ctaid=(6,1,0) tid=(9,7,0)
GPGPU-Sim PTX: 24200000 instructions simulated : ctaid=(5,11,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 39000  inst.: 24174400 (ipc=619.9) sim_rate=271622 (inst/sec) elapsed = 0:0:01:29 / Mon Jun 14 16:20:14 2021
GPGPU-Sim PTX: 24300000 instructions simulated : ctaid=(13,6,0) tid=(7,2,0)
GPGPU-Sim PTX: 24400000 instructions simulated : ctaid=(5,7,0) tid=(5,1,0)
GPGPU-Sim PTX: 24500000 instructions simulated : ctaid=(10,14,0) tid=(7,6,0)
GPGPU-Sim uArch: cycles simulated: 39500  inst.: 24517452 (ipc=620.7) sim_rate=272416 (inst/sec) elapsed = 0:0:01:30 / Mon Jun 14 16:20:15 2021
GPGPU-Sim PTX: 24600000 instructions simulated : ctaid=(7,7,0) tid=(1,9,0)
GPGPU-Sim PTX: 24700000 instructions simulated : ctaid=(0,8,0) tid=(3,8,0)
GPGPU-Sim PTX: 24800000 instructions simulated : ctaid=(8,8,0) tid=(1,5,0)
GPGPU-Sim uArch: cycles simulated: 40000  inst.: 24838280 (ipc=621.0) sim_rate=272948 (inst/sec) elapsed = 0:0:01:31 / Mon Jun 14 16:20:16 2021
GPGPU-Sim PTX: 24900000 instructions simulated : ctaid=(13,2,0) tid=(1,9,0)
GPGPU-Sim PTX: 25000000 instructions simulated : ctaid=(10,0,0) tid=(3,4,0)
GPGPU-Sim PTX: 25100000 instructions simulated : ctaid=(4,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 25200000 instructions simulated : ctaid=(11,13,0) tid=(5,1,0)
GPGPU-Sim uArch: cycles simulated: 40500  inst.: 25182424 (ipc=621.8) sim_rate=273722 (inst/sec) elapsed = 0:0:01:32 / Mon Jun 14 16:20:17 2021
GPGPU-Sim PTX: 25300000 instructions simulated : ctaid=(14,12,0) tid=(1,1,0)
GPGPU-Sim PTX: 25400000 instructions simulated : ctaid=(4,3,0) tid=(1,1,0)
GPGPU-Sim PTX: 25500000 instructions simulated : ctaid=(6,10,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 41000  inst.: 25501984 (ipc=622.0) sim_rate=274214 (inst/sec) elapsed = 0:0:01:33 / Mon Jun 14 16:20:18 2021
GPGPU-Sim PTX: 25600000 instructions simulated : ctaid=(14,0,0) tid=(3,6,0)
GPGPU-Sim PTX: 25700000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 25800000 instructions simulated : ctaid=(1,8,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 41500  inst.: 25839444 (ipc=622.6) sim_rate=274887 (inst/sec) elapsed = 0:0:01:34 / Mon Jun 14 16:20:19 2021
GPGPU-Sim PTX: 25900000 instructions simulated : ctaid=(13,4,0) tid=(7,0,0)
GPGPU-Sim PTX: 26000000 instructions simulated : ctaid=(12,14,0) tid=(5,5,0)
GPGPU-Sim PTX: 26100000 instructions simulated : ctaid=(8,14,0) tid=(1,7,0)
GPGPU-Sim PTX: 26200000 instructions simulated : ctaid=(8,0,0) tid=(3,6,0)
GPGPU-Sim uArch: cycles simulated: 42000  inst.: 26158016 (ipc=622.8) sim_rate=275347 (inst/sec) elapsed = 0:0:01:35 / Mon Jun 14 16:20:20 2021
GPGPU-Sim PTX: 26300000 instructions simulated : ctaid=(6,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 26400000 instructions simulated : ctaid=(4,6,0) tid=(9,9,0)
GPGPU-Sim PTX: 26500000 instructions simulated : ctaid=(7,14,0) tid=(7,8,0)
GPGPU-Sim uArch: cycles simulated: 42500  inst.: 26488808 (ipc=623.3) sim_rate=275925 (inst/sec) elapsed = 0:0:01:36 / Mon Jun 14 16:20:21 2021
GPGPU-Sim PTX: 26600000 instructions simulated : ctaid=(0,1,0) tid=(3,4,0)
GPGPU-Sim PTX: 26700000 instructions simulated : ctaid=(13,1,0) tid=(5,1,0)
GPGPU-Sim PTX: 26800000 instructions simulated : ctaid=(14,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 43000  inst.: 26799472 (ipc=623.2) sim_rate=273464 (inst/sec) elapsed = 0:0:01:38 / Mon Jun 14 16:20:23 2021
GPGPU-Sim PTX: 26900000 instructions simulated : ctaid=(14,5,0) tid=(1,9,0)
GPGPU-Sim PTX: 27000000 instructions simulated : ctaid=(9,8,0) tid=(5,7,0)
GPGPU-Sim PTX: 27100000 instructions simulated : ctaid=(3,1,0) tid=(5,7,0)
GPGPU-Sim uArch: cycles simulated: 43500  inst.: 27137944 (ipc=623.9) sim_rate=274120 (inst/sec) elapsed = 0:0:01:39 / Mon Jun 14 16:20:24 2021
GPGPU-Sim PTX: 27200000 instructions simulated : ctaid=(5,13,0) tid=(7,0,0)
GPGPU-Sim PTX: 27300000 instructions simulated : ctaid=(11,8,0) tid=(7,6,0)
GPGPU-Sim PTX: 27400000 instructions simulated : ctaid=(8,1,0) tid=(9,3,0)
GPGPU-Sim PTX: 27500000 instructions simulated : ctaid=(10,4,0) tid=(5,9,0)
GPGPU-Sim uArch: cycles simulated: 44000  inst.: 27461176 (ipc=624.1) sim_rate=274611 (inst/sec) elapsed = 0:0:01:40 / Mon Jun 14 16:20:25 2021
GPGPU-Sim PTX: 27600000 instructions simulated : ctaid=(3,4,0) tid=(5,5,0)
GPGPU-Sim PTX: 27700000 instructions simulated : ctaid=(6,2,0) tid=(3,2,0)
GPGPU-Sim PTX: 27800000 instructions simulated : ctaid=(0,8,0) tid=(7,2,0)
GPGPU-Sim uArch: cycles simulated: 44500  inst.: 27822080 (ipc=625.2) sim_rate=275466 (inst/sec) elapsed = 0:0:01:41 / Mon Jun 14 16:20:26 2021
GPGPU-Sim PTX: 27900000 instructions simulated : ctaid=(6,10,0) tid=(5,7,0)
GPGPU-Sim PTX: 28000000 instructions simulated : ctaid=(9,6,0) tid=(1,3,0)
GPGPU-Sim PTX: 28100000 instructions simulated : ctaid=(10,3,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 45000  inst.: 28146700 (ipc=625.5) sim_rate=275948 (inst/sec) elapsed = 0:0:01:42 / Mon Jun 14 16:20:27 2021
GPGPU-Sim PTX: 28200000 instructions simulated : ctaid=(9,2,0) tid=(1,1,0)
GPGPU-Sim PTX: 28300000 instructions simulated : ctaid=(10,6,0) tid=(5,5,0)
GPGPU-Sim PTX: 28400000 instructions simulated : ctaid=(2,0,0) tid=(1,5,0)
GPGPU-Sim PTX: 28500000 instructions simulated : ctaid=(4,14,0) tid=(1,7,0)
GPGPU-Sim uArch: cycles simulated: 45500  inst.: 28480056 (ipc=625.9) sim_rate=276505 (inst/sec) elapsed = 0:0:01:43 / Mon Jun 14 16:20:28 2021
GPGPU-Sim PTX: 28600000 instructions simulated : ctaid=(8,11,0) tid=(7,4,0)
GPGPU-Sim PTX: 28700000 instructions simulated : ctaid=(13,7,0) tid=(9,5,0)
GPGPU-Sim PTX: 28800000 instructions simulated : ctaid=(8,10,0) tid=(9,1,0)
GPGPU-Sim uArch: cycles simulated: 46000  inst.: 28803560 (ipc=626.2) sim_rate=276957 (inst/sec) elapsed = 0:0:01:44 / Mon Jun 14 16:20:29 2021
GPGPU-Sim PTX: 28900000 instructions simulated : ctaid=(14,4,0) tid=(1,9,0)
GPGPU-Sim PTX: 29000000 instructions simulated : ctaid=(7,5,0) tid=(9,1,0)
GPGPU-Sim PTX: 29100000 instructions simulated : ctaid=(11,1,0) tid=(5,5,0)
GPGPU-Sim uArch: cycles simulated: 46500  inst.: 29129700 (ipc=626.4) sim_rate=277425 (inst/sec) elapsed = 0:0:01:45 / Mon Jun 14 16:20:30 2021
GPGPU-Sim PTX: 29200000 instructions simulated : ctaid=(2,9,0) tid=(9,1,0)
GPGPU-Sim PTX: 29300000 instructions simulated : ctaid=(14,6,0) tid=(3,8,0)
GPGPU-Sim PTX: 29400000 instructions simulated : ctaid=(12,14,0) tid=(5,1,0)
GPGPU-Sim PTX: 29500000 instructions simulated : ctaid=(8,14,0) tid=(3,2,0)
GPGPU-Sim uArch: cycles simulated: 47000  inst.: 29456612 (ipc=626.7) sim_rate=275295 (inst/sec) elapsed = 0:0:01:47 / Mon Jun 14 16:20:32 2021
GPGPU-Sim PTX: 29600000 instructions simulated : ctaid=(12,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 29700000 instructions simulated : ctaid=(8,9,0) tid=(5,9,0)
GPGPU-Sim PTX: 29800000 instructions simulated : ctaid=(7,13,0) tid=(3,4,0)
GPGPU-Sim uArch: cycles simulated: 47500  inst.: 29786252 (ipc=627.1) sim_rate=275798 (inst/sec) elapsed = 0:0:01:48 / Mon Jun 14 16:20:33 2021
GPGPU-Sim PTX: 29900000 instructions simulated : ctaid=(5,14,0) tid=(7,0,0)
GPGPU-Sim PTX: 30000000 instructions simulated : ctaid=(12,9,0) tid=(3,8,0)
GPGPU-Sim PTX: 30100000 instructions simulated : ctaid=(7,9,0) tid=(9,9,0)
GPGPU-Sim uArch: cycles simulated: 48000  inst.: 30128620 (ipc=627.7) sim_rate=276409 (inst/sec) elapsed = 0:0:01:49 / Mon Jun 14 16:20:34 2021
GPGPU-Sim PTX: 30200000 instructions simulated : ctaid=(7,3,0) tid=(7,2,0)
GPGPU-Sim PTX: 30300000 instructions simulated : ctaid=(5,5,0) tid=(5,1,0)
GPGPU-Sim PTX: 30400000 instructions simulated : ctaid=(8,2,0) tid=(5,5,0)
GPGPU-Sim PTX: 30500000 instructions simulated : ctaid=(5,7,0) tid=(7,0,0)
GPGPU-Sim uArch: cycles simulated: 48500  inst.: 30458644 (ipc=628.0) sim_rate=276896 (inst/sec) elapsed = 0:0:01:50 / Mon Jun 14 16:20:35 2021
GPGPU-Sim PTX: 30600000 instructions simulated : ctaid=(8,9,0) tid=(9,3,0)
GPGPU-Sim PTX: 30700000 instructions simulated : ctaid=(11,6,0) tid=(9,7,0)
GPGPU-Sim PTX: 30800000 instructions simulated : ctaid=(14,14,0) tid=(5,3,0)
GPGPU-Sim uArch: cycles simulated: 49000  inst.: 30791028 (ipc=628.4) sim_rate=277396 (inst/sec) elapsed = 0:0:01:51 / Mon Jun 14 16:20:36 2021
GPGPU-Sim PTX: 30900000 instructions simulated : ctaid=(1,13,0) tid=(7,2,0)
GPGPU-Sim PTX: 31000000 instructions simulated : ctaid=(5,13,0) tid=(3,8,0)
GPGPU-Sim PTX: 31100000 instructions simulated : ctaid=(14,8,0) tid=(9,7,0)
GPGPU-Sim uArch: cycles simulated: 49500  inst.: 31060536 (ipc=627.5) sim_rate=277326 (inst/sec) elapsed = 0:0:01:52 / Mon Jun 14 16:20:37 2021
GPGPU-Sim uArch: cycles simulated: 50000  inst.: 31139460 (ipc=622.8) sim_rate=275570 (inst/sec) elapsed = 0:0:01:53 / Mon Jun 14 16:20:38 2021
GPGPU-Sim PTX: 31200000 instructions simulated : ctaid=(11,12,0) tid=(1,5,0)
GPGPU-Sim uArch: Shader 52 finished CTA #0 (50169,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #0 (50210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 52 finished CTA #1 (50232,0), 0 CTAs running
GPGPU-Sim uArch: Shader 52 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 28 finished CTA #0 (50379,0), 1 CTAs running
GPGPU-Sim uArch: Shader 42 finished CTA #0 (50398,0), 1 CTAs running
GPGPU-Sim uArch: Shader 57 finished CTA #0 (50496,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 50500  inst.: 31175264 (ipc=617.3) sim_rate=273467 (inst/sec) elapsed = 0:0:01:54 / Mon Jun 14 16:20:39 2021
GPGPU-Sim uArch: Shader 65 finished CTA #0 (50597,0), 1 CTAs running
GPGPU-Sim uArch: Shader 104 finished CTA #0 (50638,0), 1 CTAs running
GPGPU-Sim uArch: Shader 30 finished CTA #1 (50665,0), 1 CTAs running
GPGPU-Sim uArch: Shader 112 finished CTA #0 (50754,0), 1 CTAs running
GPGPU-Sim uArch: Shader 28 finished CTA #1 (50810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 28 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 57 finished CTA #1 (50829,0), 0 CTAs running
GPGPU-Sim uArch: Shader 57 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 75 finished CTA #0 (50837,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #0 (50839,0), 1 CTAs running
GPGPU-Sim uArch: Shader 60 finished CTA #0 (50843,0), 1 CTAs running
GPGPU-Sim uArch: Shader 47 finished CTA #0 (50845,0), 0 CTAs running
GPGPU-Sim uArch: Shader 47 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #0 (50861,0), 1 CTAs running
GPGPU-Sim uArch: Shader 9 finished CTA #1 (50862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 9 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 88 finished CTA #1 (50894,0), 0 CTAs running
GPGPU-Sim uArch: Shader 88 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 112 finished CTA #1 (50899,0), 0 CTAs running
GPGPU-Sim uArch: Shader 112 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 87 finished CTA #0 (50903,0), 0 CTAs running
GPGPU-Sim uArch: Shader 87 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 60 finished CTA #1 (50905,0), 0 CTAs running
GPGPU-Sim uArch: Shader 60 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 55 finished CTA #0 (50915,0), 0 CTAs running
GPGPU-Sim uArch: Shader 55 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 30 finished CTA #0 (50933,0), 0 CTAs running
GPGPU-Sim uArch: Shader 30 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 32 finished CTA #0 (50948,0), 1 CTAs running
GPGPU-Sim uArch: Shader 39 finished CTA #0 (50953,0), 0 CTAs running
GPGPU-Sim uArch: Shader 39 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 73 finished CTA #0 (50959,0), 1 CTAs running
GPGPU-Sim uArch: Shader 114 finished CTA #0 (50971,0), 1 CTAs running
GPGPU-Sim uArch: Shader 80 finished CTA #0 (50985,0), 1 CTAs running
GPGPU-Sim uArch: Shader 0 finished CTA #0 (50990,0), 1 CTAs running
GPGPU-Sim uArch: cycles simulated: 51000  inst.: 31177832 (ipc=611.3) sim_rate=271111 (inst/sec) elapsed = 0:0:01:55 / Mon Jun 14 16:20:40 2021
GPGPU-Sim uArch: Shader 29 finished CTA #1 (51027,0), 1 CTAs running
GPGPU-Sim uArch: Shader 23 finished CTA #0 (51028,0), 0 CTAs running
GPGPU-Sim uArch: Shader 23 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #0 (51028,0), 1 CTAs running
GPGPU-Sim uArch: Shader 63 finished CTA #0 (51035,0), 0 CTAs running
GPGPU-Sim uArch: Shader 63 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 29 finished CTA #0 (51056,0), 0 CTAs running
GPGPU-Sim uArch: Shader 29 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #0 (51059,0), 1 CTAs running
GPGPU-Sim uArch: Shader 32 finished CTA #1 (51065,0), 0 CTAs running
GPGPU-Sim uArch: Shader 32 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 17 finished CTA #0 (51066,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #0 (51082,0), 1 CTAs running
GPGPU-Sim uArch: Shader 73 finished CTA #1 (51111,0), 0 CTAs running
GPGPU-Sim uArch: Shader 73 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 89 finished CTA #1 (51114,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #0 (51119,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #0 (51138,0), 1 CTAs running
GPGPU-Sim uArch: Shader 94 finished CTA #0 (51144,0), 1 CTAs running
GPGPU-Sim uArch: Shader 17 finished CTA #1 (51149,0), 0 CTAs running
GPGPU-Sim uArch: Shader 17 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #0 (51157,0), 1 CTAs running
GPGPU-Sim uArch: Shader 109 finished CTA #0 (51193,0), 1 CTAs running
GPGPU-Sim uArch: Shader 75 finished CTA #1 (51194,0), 0 CTAs running
GPGPU-Sim uArch: Shader 75 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 90 finished CTA #0 (51200,0), 1 CTAs running
GPGPU-Sim uArch: Shader 15 finished CTA #0 (51206,0), 0 CTAs running
GPGPU-Sim uArch: Shader 15 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 26 finished CTA #0 (51206,0), 1 CTAs running
GPGPU-Sim uArch: Shader 72 finished CTA #0 (51210,0), 1 CTAs running
GPGPU-Sim uArch: Shader 89 finished CTA #0 (51216,0), 0 CTAs running
GPGPU-Sim uArch: Shader 89 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 41 finished CTA #0 (51219,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #0 (51227,0), 1 CTAs running
GPGPU-Sim uArch: Shader 53 finished CTA #1 (51230,0), 1 CTAs running
GPGPU-Sim uArch: Shader 37 finished CTA #0 (51232,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #0 (51241,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #0 (51243,0), 1 CTAs running
GPGPU-Sim uArch: Shader 65 finished CTA #1 (51243,0), 0 CTAs running
GPGPU-Sim uArch: Shader 65 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 10 finished CTA #0 (51255,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #0 (51260,0), 1 CTAs running
GPGPU-Sim uArch: Shader 96 finished CTA #1 (51266,0), 0 CTAs running
GPGPU-Sim uArch: Shader 96 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 66 finished CTA #0 (51293,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #0 (51296,0), 1 CTAs running
GPGPU-Sim uArch: Shader 62 finished CTA #1 (51304,0), 1 CTAs running
GPGPU-Sim uArch: Shader 41 finished CTA #1 (51308,0), 0 CTAs running
GPGPU-Sim uArch: Shader 41 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 114 finished CTA #1 (51320,0), 0 CTAs running
GPGPU-Sim uArch: Shader 114 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 71 finished CTA #0 (51330,0), 0 CTAs running
GPGPU-Sim uArch: Shader 71 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 61 finished CTA #0 (51340,0), 1 CTAs running
GPGPU-Sim uArch: Shader 58 finished CTA #1 (51345,0), 1 CTAs running
GPGPU-Sim uArch: Shader 103 finished CTA #0 (51352,0), 0 CTAs running
GPGPU-Sim uArch: Shader 103 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 80 finished CTA #1 (51361,0), 0 CTAs running
GPGPU-Sim uArch: Shader 80 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 72 finished CTA #1 (51366,0), 0 CTAs running
GPGPU-Sim uArch: Shader 72 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 0 finished CTA #1 (51368,0), 0 CTAs running
GPGPU-Sim uArch: Shader 0 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 49 finished CTA #1 (51383,0), 0 CTAs running
GPGPU-Sim uArch: Shader 49 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #0 (51386,0), 1 CTAs running
GPGPU-Sim uArch: Shader 26 finished CTA #1 (51391,0), 0 CTAs running
GPGPU-Sim uArch: Shader 26 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 94 finished CTA #1 (51399,0), 0 CTAs running
GPGPU-Sim uArch: Shader 94 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #0 (51400,0), 1 CTAs running
GPGPU-Sim uArch: Shader 2 finished CTA #1 (51406,0), 0 CTAs running
GPGPU-Sim uArch: Shader 2 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 62 finished CTA #0 (51412,0), 0 CTAs running
GPGPU-Sim uArch: Shader 62 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #0 (51415,0), 1 CTAs running
GPGPU-Sim uArch: Shader 90 finished CTA #1 (51420,0), 0 CTAs running
GPGPU-Sim uArch: Shader 90 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #0 (51425,0), 1 CTAs running
GPGPU-Sim uArch: Shader 16 finished CTA #1 (51430,0), 0 CTAs running
GPGPU-Sim uArch: Shader 16 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #0 (51434,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #0 (51436,0), 1 CTAs running
GPGPU-Sim uArch: Shader 98 finished CTA #0 (51438,0), 1 CTAs running
GPGPU-Sim uArch: Shader 81 finished CTA #1 (51439,0), 0 CTAs running
GPGPU-Sim uArch: Shader 81 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 31 finished CTA #0 (51446,0), 0 CTAs running
GPGPU-Sim uArch: Shader 31 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 91 finished CTA #1 (51452,0), 0 CTAs running
GPGPU-Sim uArch: Shader 91 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 82 finished CTA #0 (51455,0), 1 CTAs running
GPGPU-Sim uArch: Shader 8 finished CTA #0 (51456,0), 1 CTAs running
GPGPU-Sim uArch: Shader 56 finished CTA #1 (51461,0), 0 CTAs running
GPGPU-Sim uArch: Shader 56 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 95 finished CTA #0 (51462,0), 0 CTAs running
GPGPU-Sim uArch: Shader 95 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 58 finished CTA #0 (51482,0), 0 CTAs running
GPGPU-Sim uArch: Shader 58 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 42 finished CTA #1 (51487,0), 0 CTAs running
GPGPU-Sim uArch: Shader 42 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 86 finished CTA #1 (51492,0), 0 CTAs running
GPGPU-Sim uArch: Shader 86 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 13 finished CTA #1 (51494,0), 0 CTAs running
GPGPU-Sim uArch: Shader 13 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: cycles simulated: 51500  inst.: 31182868 (ipc=605.5) sim_rate=268817 (inst/sec) elapsed = 0:0:01:56 / Mon Jun 14 16:20:41 2021
GPGPU-Sim uArch: Shader 64 finished CTA #1 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 107 finished CTA #0 (51500,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #1 (51503,0), 1 CTAs running
GPGPU-Sim uArch: Shader 10 finished CTA #1 (51508,0), 0 CTAs running
GPGPU-Sim uArch: Shader 10 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #0 (51509,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #1 (51516,0), 1 CTAs running
GPGPU-Sim uArch: Shader 82 finished CTA #1 (51518,0), 0 CTAs running
GPGPU-Sim uArch: Shader 82 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #0 (51528,0), 1 CTAs running
GPGPU-Sim uArch: Shader 99 finished CTA #0 (51530,0), 0 CTAs running
GPGPU-Sim uArch: Shader 99 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 107 finished CTA #1 (51533,0), 0 CTAs running
GPGPU-Sim uArch: Shader 107 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #0 (51536,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #1 (51538,0), 1 CTAs running
GPGPU-Sim uArch: Shader 106 finished CTA #1 (51549,0), 0 CTAs running
GPGPU-Sim uArch: Shader 106 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 18 finished CTA #0 (51553,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #0 (51559,0), 1 CTAs running
GPGPU-Sim uArch: Shader 61 finished CTA #1 (51560,0), 0 CTAs running
GPGPU-Sim uArch: Shader 61 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 40 finished CTA #1 (51569,0), 1 CTAs running
GPGPU-Sim uArch: Shader 76 finished CTA #0 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 76 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 83 finished CTA #1 (51582,0), 0 CTAs running
GPGPU-Sim uArch: Shader 83 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 25 finished CTA #1 (51588,0), 0 CTAs running
GPGPU-Sim uArch: Shader 25 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #0 (51593,0), 1 CTAs running
GPGPU-Sim uArch: Shader 27 finished CTA #0 (51594,0), 1 CTAs running
GPGPU-Sim uArch: Shader 33 finished CTA #0 (51602,0), 0 CTAs running
GPGPU-Sim uArch: Shader 33 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #1 (51614,0), 1 CTAs running
GPGPU-Sim uArch: Shader 119 finished CTA #0 (51624,0), 0 CTAs running
GPGPU-Sim uArch: Shader 119 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #0 (51626,0), 1 CTAs running
GPGPU-Sim uArch: Shader 84 finished CTA #0 (51628,0), 1 CTAs running
GPGPU-Sim uArch: Shader 97 finished CTA #0 (51634,0), 1 CTAs running
GPGPU-Sim uArch: Shader 18 finished CTA #1 (51639,0), 0 CTAs running
GPGPU-Sim uArch: Shader 18 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 93 finished CTA #0 (51640,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #0 (51650,0), 1 CTAs running
GPGPU-Sim uArch: Shader 22 finished CTA #0 (51652,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #0 (51656,0), 1 CTAs running
GPGPU-Sim uArch: Shader 108 finished CTA #0 (51657,0), 1 CTAs running
GPGPU-Sim uArch: Shader 6 finished CTA #0 (51660,0), 1 CTAs running
GPGPU-Sim uArch: Shader 40 finished CTA #0 (51662,0), 0 CTAs running
GPGPU-Sim uArch: Shader 40 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 84 finished CTA #1 (51666,0), 0 CTAs running
GPGPU-Sim uArch: Shader 84 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 92 finished CTA #1 (51668,0), 0 CTAs running
GPGPU-Sim uArch: Shader 92 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #0 (51668,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #1 (51670,0), 1 CTAs running
GPGPU-Sim uArch: Shader 24 finished CTA #0 (51674,0), 0 CTAs running
GPGPU-Sim uArch: Shader 24 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 3 finished CTA #1 (51678,0), 0 CTAs running
GPGPU-Sim uArch: Shader 3 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 59 finished CTA #1 (51683,0), 0 CTAs running
GPGPU-Sim uArch: Shader 59 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 8 finished CTA #1 (51687,0), 0 CTAs running
GPGPU-Sim uArch: Shader 8 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 115 finished CTA #1 (51688,0), 0 CTAs running
GPGPU-Sim uArch: Shader 115 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #0 (51689,0), 1 CTAs running
GPGPU-Sim uArch: Shader 111 finished CTA #0 (51699,0), 0 CTAs running
GPGPU-Sim uArch: Shader 111 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #0 (51699,0), 1 CTAs running
GPGPU-Sim uArch: Shader 34 finished CTA #1 (51701,0), 0 CTAs running
GPGPU-Sim uArch: Shader 34 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 54 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 54 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 104 finished CTA #1 (51715,0), 0 CTAs running
GPGPU-Sim uArch: Shader 104 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 48 finished CTA #0 (51730,0), 0 CTAs running
GPGPU-Sim uArch: Shader 48 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 20 finished CTA #1 (51739,0), 0 CTAs running
GPGPU-Sim uArch: Shader 20 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 11 finished CTA #1 (51741,0), 0 CTAs running
GPGPU-Sim uArch: Shader 11 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 97 finished CTA #1 (51748,0), 0 CTAs running
GPGPU-Sim uArch: Shader 97 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 113 finished CTA #0 (51749,0), 1 CTAs running
GPGPU-Sim uArch: Shader 113 finished CTA #1 (51751,0), 0 CTAs running
GPGPU-Sim uArch: Shader 113 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 27 finished CTA #1 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 27 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 53 finished CTA #0 (51754,0), 0 CTAs running
GPGPU-Sim uArch: Shader 53 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #0 (51756,0), 1 CTAs running
GPGPU-Sim uArch: Shader 43 finished CTA #0 (51767,0), 1 CTAs running
GPGPU-Sim uArch: Shader 36 finished CTA #1 (51771,0), 0 CTAs running
GPGPU-Sim uArch: Shader 36 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 105 finished CTA #1 (51776,0), 0 CTAs running
GPGPU-Sim uArch: Shader 105 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #0 (51783,0), 1 CTAs running
GPGPU-Sim uArch: Shader 93 finished CTA #1 (51788,0), 0 CTAs running
GPGPU-Sim uArch: Shader 93 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #0 (51790,0), 1 CTAs running
GPGPU-Sim uArch: Shader 50 finished CTA #1 (51794,0), 0 CTAs running
GPGPU-Sim uArch: Shader 50 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 67 finished CTA #1 (51796,0), 0 CTAs running
GPGPU-Sim uArch: Shader 67 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 7 finished CTA #0 (51797,0), 0 CTAs running
GPGPU-Sim uArch: Shader 7 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 19 finished CTA #0 (51797,0), 1 CTAs running
GPGPU-Sim uArch: Shader 74 finished CTA #1 (51800,0), 0 CTAs running
GPGPU-Sim uArch: Shader 74 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 43 finished CTA #1 (51801,0), 0 CTAs running
GPGPU-Sim uArch: Shader 43 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 102 finished CTA #1 (51802,0), 0 CTAs running
GPGPU-Sim uArch: Shader 102 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 79 finished CTA #0 (51810,0), 0 CTAs running
GPGPU-Sim uArch: Shader 79 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 98 finished CTA #1 (51811,0), 0 CTAs running
GPGPU-Sim uArch: Shader 98 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 85 finished CTA #1 (51813,0), 0 CTAs running
GPGPU-Sim uArch: Shader 85 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 51 finished CTA #0 (51824,0), 1 CTAs running
GPGPU-Sim uArch: Shader 66 finished CTA #1 (51833,0), 0 CTAs running
GPGPU-Sim uArch: Shader 66 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #0 (51840,0), 1 CTAs running
GPGPU-Sim uArch: Shader 35 finished CTA #1 (51840,0), 0 CTAs running
GPGPU-Sim uArch: Shader 35 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 44 finished CTA #0 (51841,0), 1 CTAs running
GPGPU-Sim uArch: Shader 101 finished CTA #1 (51842,0), 1 CTAs running
GPGPU-Sim uArch: Shader 64 finished CTA #0 (51847,0), 0 CTAs running
GPGPU-Sim uArch: Shader 64 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 37 finished CTA #1 (51851,0), 0 CTAs running
GPGPU-Sim uArch: Shader 37 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 77 finished CTA #0 (51851,0), 1 CTAs running
GPGPU-Sim uArch: Shader 51 finished CTA #1 (51855,0), 0 CTAs running
GPGPU-Sim uArch: Shader 51 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 109 finished CTA #1 (51857,0), 0 CTAs running
GPGPU-Sim uArch: Shader 109 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 12 finished CTA #1 (51860,0), 0 CTAs running
GPGPU-Sim uArch: Shader 12 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 6 finished CTA #1 (51862,0), 0 CTAs running
GPGPU-Sim uArch: Shader 6 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 116 finished CTA #1 (51867,0), 0 CTAs running
GPGPU-Sim uArch: Shader 116 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 101 finished CTA #0 (51870,0), 0 CTAs running
GPGPU-Sim uArch: Shader 101 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #1 (51871,0), 1 CTAs running
GPGPU-Sim uArch: Shader 38 finished CTA #0 (51881,0), 1 CTAs running
GPGPU-Sim uArch: Shader 19 finished CTA #1 (51882,0), 0 CTAs running
GPGPU-Sim uArch: Shader 19 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 38 finished CTA #1 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 38 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 117 finished CTA #0 (51887,0), 0 CTAs running
GPGPU-Sim uArch: Shader 117 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #0 (51902,0), 1 CTAs running
GPGPU-Sim uArch: Shader 44 finished CTA #1 (51906,0), 0 CTAs running
GPGPU-Sim uArch: Shader 44 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #0 (51907,0), 1 CTAs running
GPGPU-Sim uArch: Shader 21 finished CTA #1 (51908,0), 0 CTAs running
GPGPU-Sim uArch: Shader 21 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 14 finished CTA #1 (51913,0), 0 CTAs running
GPGPU-Sim uArch: Shader 14 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 1 finished CTA #1 (51923,0), 0 CTAs running
GPGPU-Sim uArch: Shader 1 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 22 finished CTA #1 (51928,0), 0 CTAs running
GPGPU-Sim uArch: Shader 22 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #0 (51933,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 110 finished CTA #0 (51939,0), 1 CTAs running
GPGPU-Sim uArch: Shader 78 finished CTA #1 (51944,0), 0 CTAs running
GPGPU-Sim uArch: Shader 78 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 108 finished CTA #1 (51951,0), 0 CTAs running
GPGPU-Sim uArch: Shader 108 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 45 finished CTA #1 (51964,0), 0 CTAs running
GPGPU-Sim uArch: Shader 45 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #0 (51966,0), 1 CTAs running
GPGPU-Sim uArch: Shader 77 finished CTA #1 (51966,0), 0 CTAs running
GPGPU-Sim uArch: Shader 77 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 68 finished CTA #1 (51969,0), 0 CTAs running
GPGPU-Sim uArch: Shader 68 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #0 (51986,0), 1 CTAs running
GPGPU-Sim uArch: Shader 70 finished CTA #0 (52000,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #0 (52004,0), 1 CTAs running
GPGPU-Sim uArch: Shader 4 finished CTA #1 (52012,0), 0 CTAs running
GPGPU-Sim uArch: Shader 4 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 100 finished CTA #1 (52016,0), 0 CTAs running
GPGPU-Sim uArch: Shader 100 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 110 finished CTA #1 (52020,0), 0 CTAs running
GPGPU-Sim uArch: Shader 110 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 118 finished CTA #1 (52024,0), 0 CTAs running
GPGPU-Sim uArch: Shader 118 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 5 finished CTA #0 (52044,0), 1 CTAs running
GPGPU-Sim uArch: Shader 46 finished CTA #1 (52044,0), 0 CTAs running
GPGPU-Sim uArch: Shader 46 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #0 (52048,0), 1 CTAs running
GPGPU-Sim uArch: Shader 5 finished CTA #1 (52061,0), 0 CTAs running
GPGPU-Sim uArch: Shader 5 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 69 finished CTA #1 (52107,0), 0 CTAs running
GPGPU-Sim uArch: Shader 69 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: Shader 70 finished CTA #1 (52113,0), 0 CTAs running
GPGPU-Sim uArch: Shader 70 empty (release kernel 1 '_Z14matrix_mul_gpuPiS_S_i').
GPGPU-Sim uArch: GPU detected kernel '_Z14matrix_mul_gpuPiS_S_i' finished on shader 70.
kernel_name = _Z14matrix_mul_gpuPiS_S_i 
kernel_launch_uid = 1 
gpu_sim_cycle = 52114
gpu_sim_insn = 31185000
gpu_ipc =     598.3997
gpu_tot_sim_cycle = 52114
gpu_tot_sim_insn = 31185000
gpu_tot_ipc =     598.3997
gpu_tot_issued_cta = 0
gpu_stall_dramfull = 74963
gpu_stall_icnt2sh    = 150961
gpu_total_sim_rate=268836

========= Core cache stats =========
L1I_cache:
	L1I_total_cache_accesses = 696598
	L1I_total_cache_misses = 3598
	L1I_total_cache_miss_rate = 0.0052
	L1I_total_cache_pending_hits = 0
	L1I_total_cache_reservation_fails = 0
L1D_cache:
	L1D_cache_core[0]: Access = 40337, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9587, Reservation_fails = 17954
	L1D_cache_core[1]: Access = 40291, Miss = 2547, Miss_rate = 0.063, Pending_hits = 9564, Reservation_fails = 17242
	L1D_cache_core[2]: Access = 40297, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9568, Reservation_fails = 19637
	L1D_cache_core[3]: Access = 40231, Miss = 2551, Miss_rate = 0.063, Pending_hits = 9512, Reservation_fails = 20346
	L1D_cache_core[4]: Access = 40282, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9553, Reservation_fails = 20125
	L1D_cache_core[5]: Access = 40276, Miss = 2563, Miss_rate = 0.064, Pending_hits = 9550, Reservation_fails = 21507
	L1D_cache_core[6]: Access = 40282, Miss = 2561, Miss_rate = 0.064, Pending_hits = 9549, Reservation_fails = 17583
	L1D_cache_core[7]: Access = 40322, Miss = 2555, Miss_rate = 0.063, Pending_hits = 9582, Reservation_fails = 19592
	L1D_cache_core[8]: Access = 40328, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9572, Reservation_fails = 18611
	L1D_cache_core[9]: Access = 40323, Miss = 2556, Miss_rate = 0.063, Pending_hits = 9579, Reservation_fails = 17753
	L1D_cache_core[10]: Access = 40328, Miss = 2568, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 19928
	L1D_cache_core[11]: Access = 40322, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9591, Reservation_fails = 18029
	L1D_cache_core[12]: Access = 40328, Miss = 2580, Miss_rate = 0.064, Pending_hits = 9588, Reservation_fails = 17457
	L1D_cache_core[13]: Access = 40338, Miss = 2572, Miss_rate = 0.064, Pending_hits = 9599, Reservation_fails = 17935
	L1D_cache_core[14]: Access = 40343, Miss = 2570, Miss_rate = 0.064, Pending_hits = 9605, Reservation_fails = 16746
	L1D_total_cache_accesses = 604628
	L1D_total_cache_misses = 38436
	L1D_total_cache_miss_rate = 0.0636
	L1D_total_cache_pending_hits = 143587
	L1D_total_cache_reservation_fails = 280445
	L1D_cache_data_port_util = 0.068
	L1D_cache_fill_port_util = 0.006
L1C_cache:
	L1C_total_cache_accesses = 3600
	L1C_total_cache_misses = 900
	L1C_total_cache_miss_rate = 0.2500
	L1C_total_cache_pending_hits = 0
	L1C_total_cache_reservation_fails = 0
L1T_cache:
	L1T_total_cache_accesses = 0
	L1T_total_cache_misses = 0
	L1T_total_cache_pending_hits = 0
	L1T_total_cache_reservation_fails = 0

Total_core_cache_stats:
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 422605
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 143587
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 34993
	Total_core_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 135803
	Total_core_cache_stats_breakdown[CONST_ACC_R][HIT] = 2700
	Total_core_cache_stats_breakdown[CONST_ACC_R][MISS] = 900
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 3443
	Total_core_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 144642
	Total_core_cache_stats_breakdown[INST_ACC_R][HIT] = 693000
	Total_core_cache_stats_breakdown[INST_ACC_R][MISS] = 3598
Shader 0 warp_id issue ditsribution:
warp_id:
0, 1, 2, 3, 4, 5, 6, 7, 
distro:
1388, 1388, 1388, 1388, 1388, 1388, 1388, 1388, 
gpgpu_n_tot_thrd_icount = 39974400
gpgpu_n_tot_w_icount = 1249200
gpgpu_n_stall_shd_mem = 614173
gpgpu_n_mem_read_local = 0
gpgpu_n_mem_write_local = 0
gpgpu_n_mem_read_global = 34993
gpgpu_n_mem_write_global = 3443
gpgpu_n_mem_texture = 0
gpgpu_n_mem_const = 120
gpgpu_n_load_insn  = 6750000
gpgpu_n_store_insn = 22500
gpgpu_n_shmem_insn = 0
gpgpu_n_tex_insn = 0
gpgpu_n_const_mem_insn = 0
gpgpu_n_param_mem_insn = 90000
gpgpu_n_shmem_bkconflict = 0
gpgpu_n_cache_bkconflict = 0
gpgpu_n_intrawarp_mshr_merge = 0
gpgpu_n_cmem_portconflict = 0
gpgpu_stall_shd_mem[c_mem][bk_conf] = 0
gpgpu_stall_shd_mem[c_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[c_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[c_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[t_mem][mshr_rc] = 0
gpgpu_stall_shd_mem[t_mem][icnt_rc] = 0
gpgpu_stall_shd_mem[t_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[s_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][bk_conf] = 0
gpgpu_stall_shd_mem[gl_mem][coal_stall] = 614173
gpgpu_stall_shd_mem[gl_mem][data_port_stall] = 0
gpgpu_stall_shd_mem[g_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[g_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[g_mem_st][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_ld][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpgpu_stall_shd_mem[l_mem_st][mshr_rc] = 0
gpgpu_stall_shd_mem[l_mem_st][icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_icnt_rc] = 0
gpgpu_stall_shd_mem[l_mem_ld][wb_rsrv_fail] = 0
gpu_reg_bank_conflict_stalls = 0
Warp Occupancy Distribution:
Stall:441805	W0_Idle:1288598	W0_Scoreboard:9482477	W1:0	W2:0	W3:0	W4:312300	W5:0	W6:0	W7:0	W8:0	W9:0	W10:0	W11:0	W12:0	W13:0	W14:0	W15:0	W16:0	W17:0	W18:0	W19:0	W20:0	W21:0	W22:0	W23:0	W24:0	W25:0	W26:0	W27:0	W28:0	W29:0	W30:0	W31:0	W32:936900
traffic_breakdown_coretomem[CONST_ACC_R] = 960 {8:120,}
traffic_breakdown_coretomem[GLOBAL_ACC_R] = 279944 {8:34993,}
traffic_breakdown_coretomem[GLOBAL_ACC_W] = 220472 {40:1891,72:1035,136:517,}
traffic_breakdown_coretomem[INST_ACC_R] = 3840 {8:480,}
traffic_breakdown_memtocore[CONST_ACC_R] = 8640 {72:120,}
traffic_breakdown_memtocore[GLOBAL_ACC_R] = 4759048 {136:34993,}
traffic_breakdown_memtocore[GLOBAL_ACC_W] = 27544 {8:3443,}
traffic_breakdown_memtocore[INST_ACC_R] = 65280 {136:480,}
maxmrqlatency = 264 
maxdqlatency = 0 
maxmflatency = 2966 
averagemflatency = 331 
max_icnt2mem_latency = 3027 
max_icnt2sh_latency = 52113 
mrq_lat_table:1864 	85 	65 	133 	191 	266 	357 	250 	1 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
dq_lat_table:0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_table:0 	0 	0 	0 	0 	0 	0 	18641 	15624 	2440 	1703 	148 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2mem_lat_table:0 	0 	0 	25799 	1252 	2017 	3082 	2408 	1558 	1817 	996 	107 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
icnt2sh_lat_table:0 	0 	0 	5374 	26785 	2935 	19 	0 	0 	0 	0 	0 	0 	0 	0 	3443 	0 	0 	0 	0 	0 	0 	0 	0 	
mf_lat_pw_table:0 	0 	0 	0 	0 	0 	0 	80 	12 	6 	4 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	0 	
maximum concurrent accesses to same row:
dram[0]:         1         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         2         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
maximum service time to same row:
dram[0]:     48986     50145         0         0      2487      2476      2843      3294      2488      3444     15918     16579     40804     41559     49682     49938 
dram[1]:     47947     50759         0         0      2569      2004      2601      2463      2029      3357     15954     16900     40935     41738     49974     49682 
dram[2]:     50116     49853         0         0      2013      2401      2741      2529      1994      3279     16219     16887     41074     41907     49691     49690 
dram[3]:     49710     50845         0         0      2009      2478      2429      2621      2751      3897     16263     17171     41160     41972     49677     49878 
dram[4]:     50237     49850         0         0      2488      2003      3332      2538      2485      3278     16469     17200     41247     42166     49919     49687 
dram[5]:     49859     50068         0         0      2007      2513      2469      3322      3301      3935     16591     17199     41469     42218     49687     49888 
average row accesses per activate:
dram[0]:  4.250000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 64.000000 72.000000 83.000000 77.000000 
dram[1]:  7.000000 14.000000      -nan      -nan 10.000000 10.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 77.000000 61.000000 82.000000 85.000000 
dram[2]: 14.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 63.000000 86.000000 81.000000 
dram[3]: 13.000000  9.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 63.000000 68.000000 75.000000 85.000000 
dram[4]: 15.000000 11.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 68.000000 65.000000 78.000000 88.000000 
dram[5]: 16.000000  8.000000      -nan      -nan 10.000000 12.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 65.000000 67.000000 81.000000 81.000000 
average row locality = 3212/88 = 36.500000
number of total memory accesses made:
dram[0]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[1]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[2]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[3]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[4]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
dram[5]:         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0         0 
total accesses: 0
min_bank_accesses = 0!
min_chip_accesses = 0!
number of total read accesses:
dram[0]:         9         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[1]:         8         6         0         0        10        10        32        32        32        32        32        32        32        32        32        32 
dram[2]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[3]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[4]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
dram[5]:         6         4         0         0        10        12        32        32        32        32        32        32        32        32        32        32 
total reads: 2117
min_bank_accesses = 0!
chip skew: 355/352 = 1.01
number of total write accesses:
dram[0]:         8         8         0         0         0         0         0         0         0         0         0         0        32        40        51        45 
dram[1]:         6         8         0         0         0         0         0         0         0         0         0         0        45        29        50        53 
dram[2]:         8         5         0         0         0         0         0         0         0         0         0         0        31        31        54        49 
dram[3]:         7         5         0         0         0         0         0         0         0         0         0         0        31        36        43        53 
dram[4]:         9         7         0         0         0         0         0         0         0         0         0         0        36        33        46        56 
dram[5]:        10         4         0         0         0         0         0         0         0         0         0         0        33        35        49        49 
total reads: 1095
min_bank_accesses = 0!
chip skew: 191/175 = 1.09
average mf latency per bank:
dram[0]:       5972      1408    none      none        7624      6015      7869      7128      9772      7109      8243      8142      2583      2276      1126      1269
dram[1]:        832      1099    none      none        4678      6681      8215      6378      8926      6471      8058      8269      2353      2426      1412      1324
dram[2]:       1016       863    none      none        7296      5077      8207      5996      7922      6544      8246      8380      2536      2313      1209      1387
dram[3]:       1202       900    none      none        3901      6944      7219      6985      7134      7113      8836      8254      2245      2250      1322      1180
dram[4]:        954      1031    none      none        7742      4595      6331      6396      6875      6570      8289      8526      2078      2526      1299      1065
dram[5]:       1138      1430    none      none        4617      6835      5760      7269      6703      6460      7953      8525      2352      2337      1205      1207
maximum mf latency per bank:
dram[0]:       1445      1801         0         0      1719      1494      2630      2529      2515      1555       378       394      1663      1710      1771      1766
dram[1]:       1412      1863         0         0      1419      1625      2966      1991      2270      1260       385       394      1617      1737      1882      1992
dram[2]:       1748      1134         0         0      1929      1216      2610      2175      2008      1289       410       388      1659      1761      1974      1745
dram[3]:       1799      1558         0         0       615      1704      2483      1767      1313      1553       427       428      1645      1618      1735      1982
dram[4]:       1846      1376         0         0      1908       939      2307      2711      1561      1338       425       399      1707      1747      1971      1751
dram[5]:       1873      1435         0         0       784      1584      1609      2693      1359      1522       431       406      1760      1622      1768      1979

Number of Memory Banks Accessed per Memory Operation per Warp (from 0):
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Average # of Memory Banks Accessed per Memory Operation per Warp=-nan

position of mrq chosen
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	

average position of mrq chosen = -nan
Memory Partition 0: 
Cache L2_bank_000:
MSHR contents

Cache L2_bank_001:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[0]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67855 n_act=17 n_pre=3 n_req=539 n_rd=710 n_write=205 bw_util=0.0266
n_activity=5774 dram_eff=0.3169
bk0: 18a 68542i bk1: 12a 68532i bk2: 0a 68787i bk3: 0a 68789i bk4: 20a 68735i bk5: 20a 68735i bk6: 64a 68637i bk7: 64a 68637i bk8: 64a 68640i bk9: 64a 68645i bk10: 64a 68647i bk11: 64a 68647i bk12: 64a 67873i bk13: 64a 67623i bk14: 64a 67349i bk15: 64a 67377i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.30346
Memory Partition 1: 
Cache L2_bank_002:
MSHR contents

Cache L2_bank_003:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[1]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67854 n_act=15 n_pre=1 n_req=545 n_rd=708 n_write=212 bw_util=0.02675
n_activity=5886 dram_eff=0.3126
bk0: 16a 68532i bk1: 12a 68588i bk2: 0a 68788i bk3: 0a 68788i bk4: 20a 68735i bk5: 20a 68733i bk6: 64a 68638i bk7: 64a 68644i bk8: 64a 68647i bk9: 64a 68644i bk10: 64a 68645i bk11: 64a 68651i bk12: 64a 67904i bk13: 64a 67987i bk14: 64a 67318i bk15: 64a 67259i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.334918
Memory Partition 2: 
Cache L2_bank_004:
MSHR contents

Cache L2_bank_005:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[2]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67873 n_act=14 n_pre=0 n_req=530 n_rd=704 n_write=199 bw_util=0.02625
n_activity=5889 dram_eff=0.3067
bk0: 12a 68632i bk1: 8a 68675i bk2: 0a 68788i bk3: 0a 68789i bk4: 20a 68736i bk5: 24a 68729i bk6: 64a 68644i bk7: 64a 68646i bk8: 64a 68647i bk9: 64a 68651i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67950i bk13: 64a 67719i bk14: 64a 67369i bk15: 64a 67364i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.253743
Memory Partition 3: 
Cache L2_bank_006:
MSHR contents

Cache L2_bank_007:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[3]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67880 n_act=14 n_pre=0 n_req=527 n_rd=704 n_write=192 bw_util=0.02605
n_activity=5725 dram_eff=0.313
bk0: 12a 68629i bk1: 8a 68661i bk2: 0a 68786i bk3: 0a 68786i bk4: 20a 68738i bk5: 24a 68729i bk6: 64a 68642i bk7: 64a 68643i bk8: 64a 68643i bk9: 64a 68649i bk10: 64a 68638i bk11: 64a 68650i bk12: 64a 67791i bk13: 64a 67660i bk14: 64a 67449i bk15: 64a 67385i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.27991
Memory Partition 4: 
Cache L2_bank_008:
MSHR contents

Cache L2_bank_009:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[4]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67869 n_act=14 n_pre=0 n_req=539 n_rd=704 n_write=203 bw_util=0.02637
n_activity=5871 dram_eff=0.309
bk0: 12a 68578i bk1: 8a 68708i bk2: 0a 68790i bk3: 0a 68790i bk4: 20a 68735i bk5: 24a 68730i bk6: 64a 68649i bk7: 64a 68649i bk8: 64a 68645i bk9: 64a 68641i bk10: 64a 68643i bk11: 64a 68647i bk12: 64a 67633i bk13: 64a 67711i bk14: 64a 67323i bk15: 64a 67376i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.317125
Memory Partition 5: 
Cache L2_bank_010:
MSHR contents

Cache L2_bank_011:
MSHR contents

In Dram Latency Queue (total = 0): 
DRAM[5]: 16 bks, busW=4 BL=8 CL=12, tRRD=2 tCCD=6, tRCD=12 tRAS=28 tRP=12 tRC=40
n_cmd=68790 n_nop=67872 n_act=14 n_pre=0 n_req=532 n_rd=704 n_write=200 bw_util=0.02628
n_activity=5855 dram_eff=0.3088
bk0: 12a 68631i bk1: 8a 68658i bk2: 0a 68787i bk3: 0a 68787i bk4: 20a 68731i bk5: 24a 68730i bk6: 64a 68647i bk7: 64a 68643i bk8: 64a 68645i bk9: 64a 68645i bk10: 64a 68648i bk11: 64a 68648i bk12: 64a 67817i bk13: 64a 67748i bk14: 64a 67487i bk15: 64a 67461i 
dram_util_bins: 0 0 0 0 0 0 0 0 0 0
dram_eff_bins: 0 0 0 0 0 0 0 0 0 0
mrqq: max=16 avg=0.286524

========= L2 cache stats =========
L2_cache_bank[0]: Access = 3621, Miss = 179, Miss_rate = 0.049, Pending_hits = 321, Reservation_fails = 4452
L2_cache_bank[1]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4350
L2_cache_bank[2]: Access = 3450, Miss = 178, Miss_rate = 0.052, Pending_hits = 315, Reservation_fails = 4369
L2_cache_bank[3]: Access = 3206, Miss = 176, Miss_rate = 0.055, Pending_hits = 296, Reservation_fails = 4198
L2_cache_bank[4]: Access = 3229, Miss = 176, Miss_rate = 0.055, Pending_hits = 297, Reservation_fails = 4050
L2_cache_bank[5]: Access = 3178, Miss = 176, Miss_rate = 0.055, Pending_hits = 292, Reservation_fails = 3913
L2_cache_bank[6]: Access = 3180, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4277
L2_cache_bank[7]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 306, Reservation_fails = 3642
L2_cache_bank[8]: Access = 3181, Miss = 176, Miss_rate = 0.055, Pending_hits = 298, Reservation_fails = 3777
L2_cache_bank[9]: Access = 3209, Miss = 176, Miss_rate = 0.055, Pending_hits = 308, Reservation_fails = 4112
L2_cache_bank[10]: Access = 3193, Miss = 176, Miss_rate = 0.055, Pending_hits = 290, Reservation_fails = 3935
L2_cache_bank[11]: Access = 3199, Miss = 176, Miss_rate = 0.055, Pending_hits = 301, Reservation_fails = 4179
L2_total_cache_accesses = 39036
L2_total_cache_misses = 2117
L2_total_cache_miss_rate = 0.0542
L2_total_cache_pending_hits = 3621
L2_total_cache_reservation_fails = 49254
L2_total_cache_breakdown:
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT] = 30369
	L2_cache_stats_breakdown[GLOBAL_ACC_R][HIT_RESERVED] = 3216
	L2_cache_stats_breakdown[GLOBAL_ACC_R][MISS] = 1408
	L2_cache_stats_breakdown[GLOBAL_ACC_R][RESERVATION_FAIL] = 48229
	L2_cache_stats_breakdown[CONST_ACC_R][HIT] = 116
	L2_cache_stats_breakdown[CONST_ACC_R][HIT_RESERVED] = 3
	L2_cache_stats_breakdown[CONST_ACC_R][MISS] = 1
	L2_cache_stats_breakdown[CONST_ACC_R][RESERVATION_FAIL] = 129
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT] = 2348
	L2_cache_stats_breakdown[GLOBAL_ACC_W][HIT_RESERVED] = 391
	L2_cache_stats_breakdown[GLOBAL_ACC_W][MISS] = 704
	L2_cache_stats_breakdown[GLOBAL_ACC_W][RESERVATION_FAIL] = 552
	L2_cache_stats_breakdown[INST_ACC_R][HIT] = 465
	L2_cache_stats_breakdown[INST_ACC_R][HIT_RESERVED] = 11
	L2_cache_stats_breakdown[INST_ACC_R][MISS] = 4
	L2_cache_stats_breakdown[INST_ACC_R][RESERVATION_FAIL] = 344
L2_cache_data_port_util = 0.204
L2_cache_fill_port_util = 0.014

icnt_total_pkts_mem_to_simt=181168
icnt_total_pkts_simt_to_mem=45065
LD_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ST_mem_lat_dist  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
----------------------------Interconnect-DETAILS--------------------------------
Class 0:
Packet latency average = 26.5755
	minimum = 6
	maximum = 856
Network latency average = 18.7063
	minimum = 6
	maximum = 797
Slowest packet = 1042
Flit latency average = 13.4921
	minimum = 6
	maximum = 797
Slowest flit = 2494
Fragmentation average = 0.0075187
	minimum = 0
	maximum = 332
Injected packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Accepted packet rate average = 0.0554852
	minimum = 0.0496412 (at node 1)
	maximum = 0.0694823 (at node 15)
Injected flit rate average = 0.160782
	minimum = 0.0571056 (at node 1)
	maximum = 0.320317 (at node 15)
Accepted flit rate average= 0.160782
	minimum = 0.0703074 (at node 21)
	maximum = 0.233181 (at node 12)
Injected packet length average = 2.89775
Accepted packet length average = 2.89775
Total in-flight flits = 0 (0 measured)
====== Overall Traffic Statistics ======
====== Traffic class 0 ======
Packet latency average = 26.5755 (1 samples)
	minimum = 6 (1 samples)
	maximum = 856 (1 samples)
Network latency average = 18.7063 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Flit latency average = 13.4921 (1 samples)
	minimum = 6 (1 samples)
	maximum = 797 (1 samples)
Fragmentation average = 0.0075187 (1 samples)
	minimum = 0 (1 samples)
	maximum = 332 (1 samples)
Injected packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Accepted packet rate average = 0.0554852 (1 samples)
	minimum = 0.0496412 (1 samples)
	maximum = 0.0694823 (1 samples)
Injected flit rate average = 0.160782 (1 samples)
	minimum = 0.0571056 (1 samples)
	maximum = 0.320317 (1 samples)
Accepted flit rate average = 0.160782 (1 samples)
	minimum = 0.0703074 (1 samples)
	maximum = 0.233181 (1 samples)
Injected packet size average = 2.89775 (1 samples)
Accepted packet size average = 2.89775 (1 samples)
Hops average = 1 (1 samples)
----------------------------END-of-Interconnect-DETAILS-------------------------


gpgpu_simulation_time = 0 days, 0 hrs, 1 min, 56 sec (116 sec)
gpgpu_simulation_rate = 268836 (inst/sec)
gpgpu_simulation_rate = 449 (cycle/sec)
total time is 116577 ms
2021-06-14T16:20:41+08:00
